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ABSTRACT 


In  nonparametric  testing  based  on  ranks,  the  occurence  of 
ties  is  a  fairly  common  phenomenon.  Many  methods  have  been  suggested 
for  assigning  ranks  to  tied  observations  in  such  problems.  In  this 
thesis  we  review  and  discuss  these  methods  with  regards  to  their  rela¬ 
tive  merits  under  different  situations.  The  recent  extention  of  the 
asymptotic  theory  of  rank  statistics  from  continuous  to  discontinuous 
distributions  has  made  it  possible  to  calculate  the  Asymptotic  Relative 
Efficiency  (ARE)  of  different  methods  of  handling  ties.  A  comparison, 
based  on  ARE,  among  the  three  main  methods  of  handling  ties,  namely,  the 
average  scores,  midranks  and  randomized  ranks  methods,  has  been  discussed. 
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CHAPTER  1 


INTRODUCTION 


Most  nonpararaetric  test  procedures  based  on  ranks,  assume  the 
continuity  of  underlying  distributions.  In  the  case  when  data  consist 
of  independent  observations,  this  assumption  makes  the  (theoretical) 
considerations  of  ties  unnecessary,  since  then  no  tie  can  occur  with 
positive  probability.  In  practice  however,  ties  do  occur  even  when  the 
underlying  distributions  may  be  assumed  continuous.  This  happens  due  to 
various  reasons,  such  as  rounding  off  errors,  limited  refinement  of  mea¬ 
suring  instruments  etc..  In  the  discontinuous  case  however,  ties  cannot 
be  ignored  even  in  theoretical  considerations.  Since  the  occurence  of 
ties  is  fairly  common  in  most  practical  data,  it  should  be  of  considera¬ 
ble  interest  to  statisticians  using  nonparametric  methods  to  study  the 
operating  characteristics  of  various  methods  of  dealing  with  tied  obser¬ 
vations  . 


In  the  present  study  we  discuss  this  problem  when  the  set  of 
observations  are  mutually  independent  and  the  underlying  distributions  are 
either  continuous  or  discontinuous  (including  discrete) .  As  early  as  in 
1945  this  problem  was  recognized  to  be  of  practical  importance  and  methods 
of  treating  tied  observations  in  two-sample  test  were  proposed  by  Wilcoxon 
(1945).  Kendall  (1970)  and  Krushkal  and  Wallis  (1952)  have  also  dealt 
this  problem  in  their  respective  tests.  Putter  (1955)  considers  the  case 
of  purely  discrete  observations  and  examines  the  merits  and  demerits  of 
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randomized  vs.  non-rancbmized  methods  of  treating  ties.  A  study  by  Pratt 
(1959)  gives  intuitive  arguments  to  show  that  the  procedure  suggested  by 
Wilcoxon  dealing  with  zeros  in  Wilcoxon  signed  rank  test  has  many  short 
comings.  Pratt  suggested  a  new  method  which  ranks  zero  also  and  then 
dropping  their  ranks,  whereas  zeros  are  totally  ignored  in  the  procedure 
proposed  by  Wilcoxon.  (See  Conover  (1973b)  for  a  detailed  comparison  of 
these  two  methods.)  After  Putter’s  study  many  papers  dealing  with  nonpar- 
ametric  tests  and  their  efficiencies  in  the  discrete  case  have  appeared 
in  literature.  Chanda  (1963)  obtains  the  efficiencies  of  Wilcoxon  two 
sample  test  under  different  discrete  distributions.  Among  others,  Buhler 
(1967),  Klotz  (1966),  Krauth  (1971),  Taylor  (1964)  and  Verlickova  (1972) 
should  be  mentioned.  Conover  (1973a)  discusses  the  tests  of  randomness 
and  symmetry  under  general  case  (with  no  continuity  assumption) . 

In  Chapter  2  we  give  an  account  of  different  methods  of  hand¬ 
ling  ties  and  demonstrate  these  by  means  of  an  example.  Some  major  non- 
parametric  tests  in  forms  appropriate  for  continuous  distributions  are 
also  described  in  this  chapter.  As  pointed  out  earlier,  ties  in  this 
situation  do  not  present  any  problem  and  may  be  treated  by  any  of  the 
three  methods  described  in  Section  2.2.  In  Sections  2.3,  2.4  and  2.5,  we 
discuss  the  common  rank  tests  for  randomness,  symmetry  and  independence 
respectively.  Section  2.6  deals  with  one  and  two  way  layouts  Analysis  of 
Variance  rank  tests  and  also  a  new  class  of  conditional  tests  for  two  way 
layout  Analysis  of  Variance  problem  as  proposed  by  Hodges  and  Lehmann 
(1962)  and  discussed  at  length  in  Mehra  and  Sarangi  (1967),  Mehra  (1968) 


and  Sen  (1968) . 
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Section  3.1  of  Chapter  3  includes  a  treatment  of  ties  in 
general  for  various  tests  discussed  in  Chapter  2.  The  conditional  means 
and  variances  given  a  vector  of  ties,  have  been  given  in  this  section  for 
large  sample  approximations.  In  Section  3.2,  3.3  and  3.4,  respectively, 
the  Wilcoxon  two  sample  test,  the  sign  test  and  the  Wilcoxon  one  sample 
signed  rank  test  are  discussed  in  detail,  in  case  the  ties  are  present  in 
the  data.  A  review  of  the  literature  dealing  with  common  one  and  two 
sample  tests  is  also  given  in  these  sections. 

Chapter  4  deals  with  the  asymptotic  efficiencies  of  the  tests 
discussed  earlier  both  with  and  without  the  assumption  of  continuity  of 
the  distribution  function.  In  Section  4.1  we  introduce  the  concept  of 
asymptotic  efficiency.  Section  4.2  gives  the  asymptotic  distributions  of 
the  linear  rank  statistics  under  different  null  hypotheses  and  in  differ¬ 
ent  cases  arising  from  different  methods  of  handling  ties.  In  Section 
4.3  the  asymptotic  distribution  of  linear  rank  statistics  under  contig¬ 
uous  alternatives  are  given.  In  Section  4.4  expression  for  asymptotic 
efficiencies  under  appropriate  conditions  are  derived.  In  this  section 
we  also  show  that  Putter’s  (1955)  result  is  a  special  case  of  a  result 
due  to  Conover  (1973a).  In  Section  4.5  we  show  with  the  help  of  two  exam¬ 
ples  that  in  testing  for  symmetry  using  rank  tests,  the  two  methods  of 
handling  ties  at  zero,  described  later,  are  superior  than  each  other  in 
different  situations. 

Finally,  Chapter  5  gives  some  general  remarks  and  conclusions 
which  should  be  of  practical  value  for  users  of  nonparametric  methods. 
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CHAPTER  2 


TIES  IN  CONTINUOUS  CASE 


In  this  chapter  we  introduce  some  major  nonparametric  (rank) 
tests  and  describe  for  these  tests  various  methods  of  handling  ties  pro¬ 
posed  and  discussed  in  the  literature. 


2.1  NOTATION  AND  PRELIMINARIES. 


Let  x 


(i) 


denote  the  ith  smallest  co-ordinate  in  n-tuple 


x  =  (x-,x0,...,x  )  so  that  x^  and  x^n^  denote  the  minimum  and  maxi- 
I  z  n 


mum 


of  n-coordinates  respectively.  If  X  =  (X_ ,X0 , . . . ,X  )  the  vector  of 

1  z  n' 

n  observations,  the  statistic  X^^  is  called  the  ith  order  statistic 
and  X^  ^  =  (X^,X^,  . . .  ,X^ )  is  the  vector  of  order  statistics. 


Now  suppose  that  with  probability  one  no  two  co-ordinates  in  X 
coincide;  this  is  the  case,  for  example,  when  *  *  *  * ’^n  are  inde¬ 

pendent  observations  with  common  continuous  distribution  function 
F(x)  =  P  (X  £  x)  .  Then 

(2.1.1)  R.  (X)  =  #  X’s  <  X. 

l  —  l 

is  called  rank  of  X.  . 

l 


Clearly, 


X.  =  X 

l 


(R±) 


-  4  - 


(2.1.2) 


. 
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The  statistic  R  =  (R^,R2 


denotes  the  vector  of  ranks. 


The  rank  tests  under  consideration  can  be  divided  in  four  major 

groups . 


Group  1. 
Group  2. 
Group  3. 
Group  4. 


Tests  of  Randomness 
Tests  of  Symmetry 
Tests  of  Independence 
Analysis  of  Variance  Tests. 


Let 


(2.1.3) 


n 

s  =  I  c  a(R  )  , 

i=l 


where  a(#)  is  a  function  on  {l,2,...,n}  and  are  the  so  called 

regression  constants.  We  shall  denote  a ^  for  a(i)  ,  i  =  l,2,...,n  ; 
a.’s  are  called  rank  scores.  S  is  called  linear  rank  statistic.  For 

l 

different  score  functions  a(')  and  appropriate  constants  ,  the 

statistic  (2.1.3)  covers  most  of  the  test  statistics  under  the  above  four 
groups . 


2.2  RANKING  OF  TIES. 

Although  the  probability  of  a  tie  is  zero  when  observations  are 
independent  and  have  a  continuous  distribution  function,  ties  do  occur  in 
practice  as  stated  earlier.  The  probability  distribution  of  S  ,  defined 
by  (2.1.3)  and  the  properties  of  tests  based  on  S  ,  however,  remain 
unchanged  when  any  of  the  methods  described  below  is  used  to  assign  ranks 


' 
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to  tied  observations.  Following  are  the  three  most  commonly  used  methods 
of  ranking  tied  observations. 


(a)  Randomization  method 

(b)  Averaged  score  method 

(c)  Midrank  method. 


Let 


(2.2.1) 


(i)  (V  <  (Ti+1) 

X  =  .  .  .  =  X  <  X 


(T-+T  )  (T  +...+T  .+1) 

12  1  g-1 

=  X  <  .  .  .  <  X 


..  =x(n> 


g 

where  X, ,x„,...,x  are  the  sizes  of  ties  and  j  x.  = 

12*  g  j 

j=l 


n 


(a)  RANDOMIZATION  METHOD. 


In  the  randomized  rank  procedure,  we  assign  ranks  to  tied  obser¬ 
vations  on  the  basis  of  some  random  experiment  in  which  each  permutation 
of  tied  observations  has  same  probability  of  occurence.  This  random 
experiment  is  introduced  only  to  deal  with  tied  observations  and  it  is  in 
no  way  related  to  the  basic  experiment.  The  outcome  of  this  experiment 
on  the  other  hand,  does  affect  the  final  decision. 


(b)  AVERAGED  SCORE  METHOD. 


For  a  given  vector  of  sizes  of  ties  (to  be  called  vector  of 

ties  henceforth)  T  =  (x  ,  ...,X  )  and  scores  a  ,a2#...,a  ,  we  shall 

i  g  — ■ 


introduce  averaged  scores 


' 


. 


7. 


T  +  . . . +T 

1  k 


(2.2.2) 


a±  =  a(i,T) 


=  f  I 


k  j— T  +...+T  “I"  1 

X  K.— JL 


a. 

J 


if  T_+...+T  <i<T_  +  ...+T,  . 

1  k-1  —  1  k 


Then  (2.1.3)  is  modified  as 


n 


(2.2.3) 


S  =  l  C  a(R  T)  . 
i=l 


(c)  MIDRANK  (AVERAGE  RANK)  METHOD. 


We  assign  the  midrank  for  all  the  observations  tied.  For 


example ,  if 


(Vi+1)  Ti 

x  =  . . .  =  x 


we  assign  rank 


x.  t+t.+I 
i-I  i 


to  x 


(Ti-i+1) 


» •  •  • 


(Ti) 


and  then  also  define 


scores  for  half  integer  i  . 


(2.2.4) 


T.  ..+T.+1 

a(R±,T)  =  a(  1~12'  1  )  , 


and  modifications  similar  to  (2.2.3)  can  be  incorporated. 


Now  we  give  an  example  to  illustrate  the  above  three  methods 


EXAMPLE  2.1:  Let  S  be  as  in  (2.1.3)  with  CL  =  1  for  i  =  1,2,..., 5  , 

a(V  -  >  n  ■ 10 


and  X  =  (X1,X2,...,X  )  =  (15,16,17,18,25,16,17,19,16,20)  ,  where 


. 


' 


■ 
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^(t)  denotes  the  quantile  function  of  the  standardized  normal  distri¬ 
bution  function.  We  have  X2  ,  X^  and  X^  tied  for  ranks  2  ,  3  and 
4  and  X^  and  X^  for  5  and  6  .  In  randomization  we  assign  rank 
4  to  X^  ,  2  to  X^  and  3  to  X,.  .  Similarly  5  for  X^  and  6 

for  X  .  Then  we  have 


S,  v  =  -  .11  , 
(ran) 


S=S  .  N  =  -  .27  ,  and 
(ave) 


S,  ...  =  -  .25  . 

(mid) 


Though  it  seems  in  this  case  that  the  above  three  values  of  S 
are  quite  close  but  this  may  not  be  the  case  in  general.  In  some  cases 
(see  Section  3.1)  the  midrank  and  average  score  procedures  are  identical. 


2.3  TESTS  OF  RANDOMNESS. 

H  :  We  shall  say  random  variables  X  ,X„,...,X  satisfy  the 
o  -L  *  n 

hypothesis  of  randomness  Hq  ,  if  they  are  independent  and  have  common 
distribution  function  F(x)  ,  i.e.,  if  P  =  L ( (X^ , . . . ,X^) )  ,  P  e  iff 

(2.3.1)  P(X1£x1,X2£x2, . . .  jX^Xn)  =  F(x]_)  *F(x2)  • . . .  *F(xn)  , 

_o°  <  x.  <  x0  <  .  .  .  <  X  <  00  , 

12  n 


where  F  is  continuous. 


.  -  mi;  [ 


9. 


Hq  :  When  F  is  arbitrary  in  Hq  ,  we  have  Hq  .  Clearly 

H  c  H  . 
o  o 


Suppose  that  the  observations  are  recorded  under  different 


conditions.  We  want  to  know  whether  the  differences  in  conditions  have 


affected  their  distributions  (X^'s  are  assumed  to  be  independent).  In 


other  words,  we  want  to  test 


(2.3.2) 


H  :  F.  =  F  ,  l<i<n  for  some  F 
o  1  —  — 


vs . 


K 


F.  f  F. 
i  J 


for  some  i  ^  j 


1  <  i  ,  j  <  n 


The  alternative,  say  K  ,  may  be  described  as  a  family  of  distributions  Q 
for  the  form  (2.3.1)  ;  K  =  (Q }  .  The  following  results  whose  proof  is 
elementary  are  needed  for  our  later  work.  Let  R  denote  the  space  of  all 
permutations  of  l,2,...,n  .  If 


(2.3.3) 

-  1  V 

_ 

i  ? 

a  =  n  ai  ’ 

1  =  1 

C  = 

n  l 
i=l 

and  R 

is  uniformly  distributed  over 

R  , 

then 

(2.3.4) 


*<s>  -  £  X  ci  X 


i=l 


a . 

1  i=l  1 


and 


where  S  is  given  by  (2.1.3). 


10. 


Let  X  ,X0,...,X  ,X  ,  ...,X  ,  (n1+n0  =  n)  denote  two 

1  ^  n^  n^+1  n^+n2  1  2 

combined  samples  of  sizes  n^  and  n2  .  Let  and  F2  be  the  dis¬ 

tribution  functions  of  the  populations  from  which  the  first  and  the  sec¬ 
ond  samples  are  drawn,  respectively.  We  want  to  test  F^  =  F2  .  Now 
suppose  F^  and  F2  differ  in  location  only,  i.e.,  F^(x)  =  F(x-y^) 
and  F2(x)  =  F(x-y2)  holds  for  some  F  and  some  constant  y^  and  y2  , 
then  we  have  the  two  sample  location  alternative: 


n 


n. 


(2.3.5) 


q(x.<Xl . x  <X)  =  n  F(x.-iO  n  vu  ,.-u9) 

1—  1  n—  n  i  1  n^+j  l 


If  sign  of  (y  -y2)  is  known,  the  alternatives  are  one  sided,  otherwise 
two  sided.  These  alternatives  may  be  summarized  as  one  sided 
(i)  y-L  >  y2  ,  (ii)  y1  <  y2  and  two  sided  \i1  +  y2  . 

Now  we  discuss  some  of  the  important  tests.  For  these  tests, 
tables  of  exact  probability  points  are  available  in  many  books,  for 
example,  Hollander  and  Wolfe  (1973),  for  small  sample  sizes.  In  case  of 
samples  of  larger  size  the  approximations  are  given.  Generally,  the  test 
procedure  consists  of  rejecting  Hq  when  S  is  too  large  or  too  small 
or  both  depending  upon  the  nature  of  the  alternative. 


(a) 


WILCOXON  TWO  SAMPLE  TEST. 

Let  a.  =  i  and  C.  =  1  for  i  <  n,  and  zero  otherwise  then 

l  l  —  1 


■  I 

i=l 


R.  =  sum  of  the  ranks  for  first  sample. 


(2.3.6) 


•  • 


(HirfJ  MlWMibo  0185  Un«  ">1  »1  !•  J  nj 


11. 


By  (2.3.4)  under  H  we  have 

o 


(2.3.7) 


E(S)  =  j  n1(n+l)  ,  V(S)  = 


—  n1n2(n+l) 


We  note  again  that  the  tied  observations  may  be  ranked  according  to  one 
of  the  three  procedures  described  in  Section  2.2. 


(b)  MEDIAN  TEST. 


If 


(2.3.8) 


a.  = 


0  if  i  <j  (n+1) 


1  if  i  >  j  (n+1) 


and  C.  =l,l<i<n1  ,  then 

l  —  —  1 


(2.3.9) 


n  n 

rl  rl  l 

s  =  l  a(R  )  =  l  u(R  -  j  n-1)  , 

i=l  i=l 


where  u(*)  is  given  by 


(2.3.10) 


u(x)  = 


0  ,  x  <  0 


1  ,  x  >  0  , 


is  called  the  median  statistic.  By  (2.3.4)  under  Hq  we  have 


(2.3.11a)  E(S)  =  ( 


1 

2  nl 


1  n-1 

2  ni  n 


if  n  even 


if  n  odd  ,  and 


. 


' 


/ 


if  n  even 


(2.3.11b) 


V(S)  =  / 


4(n-l) 


n  n2(n+l) 


if  n  odd 


REMARK  2.1:  Using  a  combinatorial  argument  it  can  be  shown  that  under 
Hq  ,  S  follows  the  hypergeometric  distribution  (see  for  example  [8]). 


(c)  THE  VAN  DER  WAERDEN  TEST. 

In  Wilcoxon  two-sample  test  if  we  set  a^  =  i|j(-j~-)  where  ijj(«) 
is  the  same  as  defined  in  Example  2.1  then  we  have  the  Van  der  Waerden  test 
statistic 


(2.3.12) 


By  (2.3.4)  under  HQ  we  have 


(2.3.13)  E(S)  =  0 


n  n  n  n 

V(S)  =  — — l  ip2C 
n(n-l)  f; 


n+1' 


REMARK  2.2:  In  tests  (a),  (b)  and  (c)  under  certain  regularity  conditions 

1/2 

(see  [8])  S-E(S) / (V(S) )  follows  approximately  N(0,1)  and  hence 

when  sample  sizes  are  large  we  can  apply  this  approximation. 

(d)  KRUSHKAL-WALLIS  k  -  SAMPLE  TEST. 

In  the  setup  of  (2.3.5)  suppose  we  have  k-sample  instead  of 
two  and  want  to  test  the  equality  of  l»Lfs,i  =  l,2,...,k,  Krushkal  and 
Wallis  (1952)  have  introduced  the  following  test. 


I 

' 
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Let  us  rank  the  pooled  sample  and  if  be  the  sum  of  the 

ranks  of  jth  sample,  then  the  test  statistic  is  defined  by 


(2.3.14) 


H 


12 

n(n+l) 


k 

\n. 

1=1  1 


-  3(n+l) 


* 


where  n^  +  ^  +  ...  +  n^  =  n  .  Exact  percentage  points  of  H  are 
given  in  Krushkal  and  Wallis  (1952)  for  small  n/s  .  If  n/s  are 
large,  H  follows  Chi-square  distribution  with  (k-1)  degrees  of  free¬ 
dom  (d.f.).  The  test  consists  of  rejecting  the  null  hypothesis  when  H 
is  large. 


REMARK  2.3.  When  k  =  2  ,  the  above  test  reduces  to  the  Wilcoxon  two- 
sample  test. 


2.4  TESTS  OF  SYMMETRY. 

:  We  shall  say  the  random  variable  satisfy 

the  hypothesis  of  symmetry  ,  if  P  e  Hq  given  by  (2.3.1),  with  F 
satisfying  symmetry  conditions,  i.e., 

(2.4.1a)  F(x)  +  F(-x)  =  1  for  all  x  . 

:  P  satisfies  all  other  conditions  of  except  the 

continuity  of  F  . 

We  can  confine  our  attention  only  to  testing  symmetry  with 
respect  to  the  origin,  as  the  other  cases  follow  with  trivial  modifica- 


-  . 


t  /- 
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d  F(x) 

tions.  Clearly  under  H  if  f(x)  =  — - — —  exists  at  a  point  x  we 

X  QX 

have 


f(x)  =  f(-x) 


The  alternative  against  which  is  frequently  tested  is  the 
shift  in  median  from  zero  to  some  point  A  .  Let  the  alternative  K  be 
a  family  of  distributions  Q  of  the  form 


n 


(2.4.1b) 


Q(X<x1,...,X  <x  )  =  n  F(x.-A) 

x  —  1  n—  n  l 

i=l 


If  the  sign  of  A  is  known,  we  have  one  sided  alternatives  otherwise  two 
sided.  Let 


n 


R,  =  l  u(|x.|- 


3=1 


|Xj|) 


=  rank  of  X.  when  X.  fs  are  ranked, 

l  l 


where  u(#)  is  defined  by  (2.3.10).  S  takes  the  following  form  when 
we  use  it  to  test  . 


n 


(2.4.2) 


s  =  l  u(X  )  a(R  ) 
1=1 


Under  H  ,  we  have 


n 


(2.4.3) 


E(S)  =  |  I  a 
i=l 


1  n  2 

and  V(S)  =  T  l  a. 

4  ,L.  i 

i=l 


. 


' 
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Now  we  shall  describe  some  of  the  important  special  cases  of 
the  statistics  (2.4.2).  Tables  for  tests  (a)  and  (b)  may  be  found  in 
Hollander  and  Wolfe  [13].  Also,  under  certain  regularity  conditions  in 
tests  (a)  and  (b)  (see  [8])  (S-E(S)  }/{V(S)  follows  N(0,1)  as 

n  +  00  and  hence  when  n  is  large  normal  tables  may  be  used. 


(a)  SIGN  TEST. 

Let  a^  =  1  ,  1  _<  i  n  .  Then  (2.4.2)  and  (2.4.3)  give 


(2.4.4) 


S 


n 


Y  u(x.)  =  #  of  positive  observations 
i=l  1 

E(S)  =  |  ,  V(S)  =  j  (under  . 


REMARK  2.4:  Note  that  if  there  are  some  zero  observations,  we  may  follow 
one  of  the  two  methods  outlined  in  Sections  3.1  and  4.5. 


(b)  WILCOXON  ONE-SAMPLE  (SIGNED  RANK)  TEST. 

Set  a^=i  ,  1  _<  i  _<  n  and  we  have  from  (2.4.2)  and  (2.4.3) 
n  + 

S  =  Y  R-  u(X.)  =  sum  of  ranks  of  positive  observations, 
i=l 

(2.4.5) 

E(S)  =  -£■  n(n+l)  and  V(S)  =  n(n+l)(2n+l)  (under  H^) 


Remark  2.4  is  applicable  here  also. 


. 
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(c)  MEHRA'S  TEST  FOR  PAIRED  COMPARISONS  (k-sample) . 

Let  us  consider  a  paired  comparison  experiment  involving  k- 
treatments.  Suppose  that  the  n_  independent  comparisons  for  the  pair 
(i,j)  of  treatments  (1  i  <  j  £  k)  provide  observed  comparison  dif¬ 
ferences  Z..0  (£  =  l,2,,..,n..)  .  Let  G..(z)  denote  their  common 

distribution  function  which  is  assumed  to  be  continuous.  The  hypothesis 
of  equality  among  the  treatments  can  be  expressed  as 


H*  :  G. . (z)  +  G.  .  (-z)  =  1  and  G.  . (z)  =  G. , . , (z)  for  any  two 
1  ij  1 J  iJ  i  J 


pairs  (i,j)  and  (if,j’>  . 


The  alternative  may  be  'not  H|'.  We  rank  the  absolute  values 
of  n  (=  £  £  n.  J  comparison  differences  Z.  (1  _<  i  <  j  _<  k  , 


i<j 


ij 


ij 


£  =  l,2,...,n„)  in  a  pooled  sample.  In  case  of  ties  we  may  use  one  of 


the  methods  described  in  Section  2.2.  Let  r^j£  be  t*ie  rank  °f 
if  Z..0  >0  and  zero  otherwise;  similarly,  let  s..0  be  the  rank  of 

ljX/ 


Z..0  if  Z..0  <0  and  zero  otherwise.  Also,  denote  R 


(i,j) 


nij 


Jij£ 


and  S 


Jij£ 


n 


=  I 


(i,j)  _ 


n.  . 
IJ 


£=1 


ij£ 


n 


-  I 


i=i 


ij£ 


.  Under  this  set  up  Mehra  (1964)  suggested  the 


following  test  statistic. 


(2.4.6) 


L  =  6[(n+l)(2n+l)k]  1  £  {f  V(i’j)/n^2}2 

i-1  i+L  J 


where  V 


(i» j)  =  R  (i,j)  _  s  (i,j) 


n 


n 


n 


. 
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The  test  consists  of  rejecting  H’  at  level  a  if  L  >  Z  ,  where  £ 

°  1  a  *  a 

satisfies  P^, [L  _>  =  a  .  No  tables  for  probability  distribution  of 

L  are  available,  for  small  n  .  But,  for  large  n  ,  L  is  asymptoti¬ 
cally  distributed  as  the  Chi-square  distribution  with  (k-1)  d.f.  under 
H|  (see  Mehra  (1964)). 

REMARK  2.5:  For  k  =  2  ,  the  above  test  reduces  to  the  Wilcoxon  paired 
comparison  test. 


2.5  TESTS  OF  INDEPENDENCE. 

H2  •  We  shall  say  that  the  family  {(X^,Y^) , , . . . , (X^Y^) } 
satisfies  the  hypothesis  of  independence  H2  ,  if  all  the  2n  random 
variables  are  mutually  independent,  X_/s  having  arbitrary  continuous 
distribution  function  F(x)  and  Y^’s  having  an  arbitrary  continuous  dis¬ 
tribution  function  G(y)  .  We  may  write  it  as 


(2.5.1)  P(X1<x1,Y1<y1,X 


o  <x2 * Y  2<y 2  * 


»x  <x  ,Y  <y  ) 
n—  n  n—  n 


n 

n  F(x  )G(y  )  . 

i=l 


H2  :  Continuity  assumption  regarding  F(x)  and  G(y)  in  H2  , 
is  dropped.  Let  us  consider  alternatives  K  =  {Q}  against  H2  ,  where 
Q's  are  given  by 


n 


(2.5.2)  Q(xlx1,Y1<y1,X2<x2,Y2<y2,...,Xn<xn,Yn<yn)  =  II  k(x±9y±) 

i=l 


where  A(x,y)  is  a  continuous  two-dimensional  distribution  function.  The 
pairs  (X-£  »Y^)  are  independent  under  Q  but  within  pairs  there  is  a 


■ 


•  1  *i*)A  J  “  fa* ,  <>X)p  (L.e.S) 
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dependence.  Statistics  for  testing  against  alternatives  of  the 

form  (2.5.2)  are  given  by. 


(2.5.3) 


where 


n 

S  =  l  a(R  )  b(Q  )  , 

•  -I  -L 

i=l 


n 


-  I  u(X^-Xj )  =  rank  of  X^  in  seperate  ranking 


and 


n 


Q.  -  J  u(Y  -Y.)  =  rank  of  Y.  in  seperate  ranking; 

J-  #  -I  1  "1  1 

J=1 


both  scores  are  non-decreasing,  u(*)  is  given  by  (2.3.10).  Tables  for 
the  tests  described  below  are  given  in  [13]. 


(a)  SPEARMAN  TEST. 

Taking  the  Wilcox  scores  a^  =  b_^  =  i  ,  we  get  from  (2.5.3) 


(2.5.4) 


n 


S  -  I  Qh 


i=l 


» 


and  it  can  be  easily  verified  that  under 

(2.5.5)  E(S)  =  ^  n(n+l)2  and  V(S)  =  n2 (n+1) 2 (n-1)  , 


and  distribution  of  S  is  approximately  normal  for  large  values  of  n  . 
If  S  is  linearly  transformed  so  that  the  minimum  and  maximum  values 
are  -1  and  +1  respectively,  it  is  called  the  Spearman  Rank  Correla¬ 
tion  Coefficient  and  is  given  by 


,  U-fl)  a*a>  JJJ  B  (8)V  bus  U+£)n  =*  {2)H  (t.C.S) 

. 
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(2.5.6)  P  =  ~y~  [S-E(S)]  =  ~y~  \  (R.  -  I  n  -  |)(Q.  -  n  -  |) 

n  -n  n  -n  i=l 

The  test  consists  of  rejecting  for  large  |p|  . 

REMARK  2.6:  If  we  use  median  scores  as  in  Section  2.3,  the  test  is 
called  Quadrant  test. 


(b)  KENDALL  T-TEST. 

Kendall  [14]  suggests  the  following  statistic  for  testing  H0 
* 

(we  use  T  to  distinguish  between  the  vector  of  ties  T  and  Kendall  t) . 


(2.5.7) 


*  =  1 

n(n+l) 


n  n 

l  l  sign(R  -R  )  sign(Q  -Q  )  . 

i=l  j=l  J  J 


k 

The  maximum  and  minimum  values  of  T  are  +1  and  -1  respectively. 

The  test  consists  of  rejecting  ^  when  | t  J  is  large.  For  large  sam- 

k 

pie  approximation,  let  us  note  that  T  is  a  linear  function  of 


(2.5.8)  K  =  H  u(R.-R.)  u(Q  -Q.)  , 

3  J 


* 

namely,  n(n-l) (1+T  )  =  4K  .  Under  we  have 


(2.5.9)  E(K)  =  j  n(n-l)  ,  V(K)  =  ~  n(n-l) (2n+5)  , 


and  the  distribution  of  K  is  approximately  normal  for  large  n  (see 


Kendall  [14]). 


' 


' 
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2.6  ANALYSIS  OF  VARIANCE  TESTS. 

We  have  already  described  one-way  layout  Analysis  of  Variance 

design  (Krushkal-Wallis  test) ,  let  us  consider  now  two-way  layout  with 

m. .  independent  random  observations  X..„  ,  &  =  1,2, . . . .  in  the 

ij  ijA  ’  ’  *  lj 

(i,j)th  cell  j  =  l,2,...,k  ,  i  =  l,2,...,n  .  Here  k  is  the  number 
of  treatments  and  n  is  the  number  of  blocks.  Let  X..0  ,  1  <  &  <  m. . 

ij£  ’  -  -  ij 

be  distributed  according  to  common  continuous  distribution  function 

(2.6.1)  F  (x)  =  F  (x+?.)  , 

•J  +J 

where  may  represent  the  unknown  block  effects.  Then  the  hypothesis 

of  equality  of  the  treatments  effects  (the  null  hypothesis)  ,  can 

be  defined  as 

(2.6.2)  H„  :  F  =  F  =  ...  =  F, 

j  1  2  k 

can  be  defined  as  usual  dropping  the  continuity  assumption  in  . 

In  the  following,  we  shall  describe  two  tests:  one  based  on  seperate 
rankings  (Friedman  test)  and  the  other  based  on  joint  ranking  of  observa 
tions  after  ’alignment*. 

REMARK  2.7:  The  model  (2.6.1)  is  used  for  aligned  rank  test  only.  For 
Friedman’s  test  we  need  the  following  weaker  condition 

F  =  F  for  some  F.  ,  i=l,2,...,n 

ij  i  1 


. 


r  .£  XSIA  CW 
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(a)  FRIEDMAN’S  TEST. 


ij 


m 


il 


for  i  =  1,2, ... ,n 

and 

j  “  1,2,.. 

.  ,k  . 

Within 

observations.  Let 

r .  . 

denote  the 

rank 

X.  . 

ij 

ij 

n 

R. 

Ik  *  Set  Rj  =  J 

i=l 

r.  . 
iJ 

• 

n 

and 

k+1 


R**=— 7T-  .  Then  Friedman’s  (1937)  test  statistic  is 


Q  = 


12n 


(2.6.3) 


k(k+l) 


l  (R-j-R  ••)' 


=  [ 


12 


nk(k+l)  j 


l  0  -  3n(k+l) 


We  reject  H^  when  Q  is  large.  Tables  for  exact  probability  points 
are  available  in  Hollander  and  Wolfe  (1973).  Under  ,  Q  has  asymp¬ 
totically,  as  n  -*•  00  ,  Chi-square  distribution  with  (k-1)  degrees  of 
freedom. 


(b)  CONDITIONAL  ALIGNED  RANK  TEST. 

The  approach  here  is  entirely  different  than  that  in  Friedman’s 
test.  We  use  the  rank  comparison  after  the  ’alignment’  (defined  below) 
as  presented  in  Hodges  and  Lehmann  (1962),  Mehra  and  Sarangi  (1967), 

Mehra  (1968)  and  Sen  (1968) . 

Alignment  essentially  means  removing  the  block  effects  £ 

(i  =  l,2,...,n)  from  observations  by  subtracting  from  each  observations 
in  a  block,  say  ith  ,  some  reasonable  function  y  of  the  observations 
in  the  block  which  satisfy  the  condition 


" 


22. 


(2.6.4)  y(X^.  +a,...,X4£^  +a,...,X_^^.  +a,...,X4  ^  +a) 


i^m. 

11  l. 


1im- 

1  k  l. 


. xlk  ~>  +a  • 

11  xi  1  Lk 


Let  the  aligned  observations  be  denoted  by  Z. . „  ,  £  =  1,2, . . . ,m. .  , 

ij£  1J 

1  _<  j  _5_  k  ,  1  <  i  £  n  and  r.  .0  be  the  rank  of  Z.  ,0  in  a  combined 

ljX/ 

ranking  of  all  the  N  =  £  N.  =  £  £  m. .  aligned  observations.  Each  con- 

i  1  i  j  1J 

ditional  situation  (given  a  set  of  ranks  for  each  block)  is  referred  to 


as  a  configuration .  More  precisely,  if  r. 


(1)  <  r  (2) 


<  . . .  <  r 


(N±) 


then  the  configuration  is  simply  an  event  E  =  (r^,^, . . .  ,r^)  in  our 
sample  sapce.  Note  that  only  randomness  (given  a  configuration)  that 
remains  is  due  to  independent  assignments  of  ranks  to  the  treatment.  Let 
a(#)  be  rank  scores  and 


(2.6.5) 


V  H 


=  sum  of  the  rank  scores  for  the  jth  treatment, 


Also,  let  m. .  =  m.  ,  for  all  i  and  i  (the  complete  case  which  also 
ij  J 

covers  the  equal  observations  per  cell).  Under  this  scheme  N^’s  are 
equal  to  Nf  =  £  m.  and  the  proposed  test  function  is  (see  Mehra  [21]) 

j  J 


(2.6.6) 


n  k, 

L  =  [(N'-D/N'  (  l  ah]  l  ~  {T  -m  nl}2 
n  .  .  i  .  .  m.  IN.  j 

i=l  J=1  J  J 


o  o  _ 

where  a.  =  £  £  {a(r..„)  -  a(i)}  /N.  and  a  =  )  a(i)/n  .  The  test 

1  j  i  1J*’  1  i 

consists  of  rejecting  for  large  values  of  L^  . 


' 


. 
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In  the  general  case  when  m^'s  do  not  satisfy  the  above 
condition,  whether  the  design  is  complete  or  not,  the  following  statistic 
is  suggested  ( [21] , (2. 7) ) . 


(2.6.7) 


V'  A”1  V  , 


where  V  =  ^  with  VN  =  [TN  ”E(T^  )  ]  ,  and  A  is  the 

’  j  j  j 

exact  covariance  matrix  of  V  (which  is  required  to  be  non-singular) 

and  E(T  )  and  <3..,  >  elements  of  A  ,  are  given  by 
j  JJ 

E(T  )  -  I  m  a(i) 
j  i-1  J 

n  a? 

°jj'  =  (Ni  “ij'  {tT^l}  ’  6jj'  =  1 


or  zero  according  as  j  =  jf  or  not  (E  and  <3 .  . ,  represent  the  condi- 

3  J 

tional  expectation  and  covariance  function  under  and  condition 

(2.6.4)). 

Computational  techniques  for  the  evaluation  of  exact  distribu¬ 
tion  for  the  statistics  of  the  type  are  discussed  in  [12]  with  Wilcox- 

on  score  and  k  =  2  .  With  the  help  of  computers  exact  distribution  tables 
can  be  prepared  for  various  values  of  k  ,  n  and  m_  .  As  n  00  the  con¬ 
ditional  statistic  ,  given  a  configuration,  converges  in  distribution, 

(under  certain  regularity  conditions,  see  [21])  to  a  chi-square  variable 
with  (k-1)  degrees  of  freedom. 


1  <  •7,i  ;J  ..  '  ■  -j 


n  eA  .  b«e  d  ,  j!  io  aouiav  auohisv  j«|  ba**q%?q  ad  o»a 

.  . 


CHAPTER  3 


TIES  IN  DISCONTINUOUS  CASE 


In  this  chapter  we  discuss  the  tests  described  in  Chapter  2 
without  the  continuity  assumption.  Consequently,  in  this  case  ties  may 
occur  with  positive  probability.  Expressions  for  conditional  (given  the 
vector  of  ties)  expectations  and  variances  are  given  for  large  sample 
approximations.  For  explicit  results  on  asymptotic  distributions,  see 
Section  4.2  and  [3]. 

3.1  TREATMENT  OF  TIES  (GENERAL  CASE). 

In  the  following  we  shall  describe  the  three  methods  of  handling 
ties  introduced  in  Section  2.2  for  the  general  case. 

(a)  RANDOMIZATION. 

k  k  k 

Let  R  =  (R-,...,R  )  be  the  vector  of  ranks  after  randomiza- 
1  n 

tion  procedure  (Section  2.2)  applied  for  tied  observations.  The  statistic 

r»  k  — 

)  C.  a(R.)  has  the  same  distribution  under  H  as  that  of 
^11  o 

n 

/  C.  a(R.)  under  H  (Similarly  for  Hn  ,  H„  and  H_  );  and  so  the  same 
.-li  o  12  3 

i=l 

tables  may  be  used.  The  asymptotic  convergence  of  distributions  for  par¬ 
ticular  statistics  is  the  same  as  described  in  Chapter  2. 


24 
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(b)  AVERAGED  SCORES. 


(i)  Tests  of  Randomness:  Let  the  scores  and  statistic 


S  be  the  same  as  in  (2.2.2)  and  (2.2.3). 


THEOREM  3.1:  Under  Hq  and  arbitrary  t  (the  vector  of  ties)  the 


statzstvc  S  satisfies 


(3.1.1) 


n  n 


:(S/x)  =  -  y  C.  y  a.  , 

n  .L.  l  .L.  l 


i=l  i=l 


V(S/T)  I  (C.-C)2  l  (1,-7) 2 

i=l  i=l 


and 


n  _  _  ry  n  _  „  n  _  2 

l  (a,-a)2  =  l  (a  -a)2  -  £  (a  -a  ) 

i=l  1  i=l  i=l 


PROOF:  It  is  easy  to  verify  that 


a(R±)  =  a  (R^) 


and  hence  S  may  be  written  equivalently  as 


n 


_  * 


S  =  7  C.  a  (R  ) 

i-1  1 


According  to  Theorem  29A  of  Hajek  (1969),  the  vectors  R  and  t  are 
independent  and  hence 


^  ^ 

(S/t)  =  E[  l  C.  a(R± ,t) ]  =  i  C.  E(a(R±,T))  , 

i=l  i=l 


(3.1.2) 


■  ■ 
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where  T  is  considered  fixed. 


_  *  l  n 

E(a(R.,T))  =  ~  l  a, (T) 


n  .  1 

i=l 


Therefore  by  (3.1.2)  we  have 


n  n 


E(S/T)  =  i  J  C,  I 


a . 

n  “  l  “  i 
i=l  i=l 


n 


V(S/T)  =  V[  l  C.  a(R  ,T)] 


i=l 


'  £  l  «i-V2  \  • 

1  =  1 


by  Theorem  3B  of  Hajek  (1969).  Finally,  it  suffices  for  completing  the 


n 


proof  to  show  that  £  (a. -a. ) (a. -a)  =  0  .  This  follows  from  the  fact 

*  i  i 

i=l 

that 


T  +. . ,+T 

1  k 


I 


i=T  +...+T  , 

1  k-1 


+1 


(ai“^i) (af-a)  =  Ti(ak-ai) (a±-a)  =0  ,  1  <  k  <  g  .  □ 


In  view  of  the  above  theorem,  as  the  distribution  of  S  depends 
upon  the  vector  of  ties  we  need  different  tables.  Hajek  (1969)  suggests 
that  if  ties  are  few,  we  can  use  the  same  table  for  S  as  for  S  noting 
that  the  resulting  critical  levels  will  be  somewhat  larger  than  the  exact 
conditional  critical  levels. 
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In  case  of  the  two-sample  Wilcoxon  test,  using  Theorem  3.1,  we 


have 


E(S/T)  =  E (S) 


and 


(3.1.3) 


V<S/T)  --3P  I  (C  -C)2  l  (a  -a)2 

11  -L  •  i  -1  •  i  -L 

i=l  i=l 


But  as  a.  =  i 

l 


n 


T  +. . ,+T 

1  k 


I  (a±-a  )  =  I  (  l 

i=l  k=l  i=T+...+T,  +1 

1  k-1 


T  +...+T  _ +1+T  +. . ,+T  0 

,  1  k-1  1  k  .n2n 

( - o - i)  ) 


T  +...+T  T 

§  1  k  y  9 

=  I  (  l  i±  -  (T  +...+T  +  -~2 — )  } 

k=l  i=T_+.. ,+T  +1 
1  k 


g  \  ?  Tk+1  2 

=  l  l  {(x+i)  -  (x  +  — X — )  } 

k=l  i=l 


where  x  = 


T  +  ...  +  T  .  Thus 
1  k-1 


n 


g 


(3.1.4) 


<ai-ai>  ■  l  12  (\_1)  Tk(\+1> 

1=1  k=l 


Therefore,  using  the  last  equation  of  (3.1.1),  (3.1.3),  (3.1.4) 


and  (2.3.7)  we  have 


(3.1.5)  V(S/T)  = 


nn  \(\  -1} 

1  2  [n+1  -  k=1 _ _ ] 


12 


n(n-l) 
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Similar  expressions  can  also  be  obtained  in  the  case  of 
Median  and  Van  der  Waerden  tests  for  expected  values  and  conditional 
variances.  For  asymptotic  distributions  of  these  statistics  see  Chapter 
4. 


In  k-sample  case,  we  can  show  as  in  Wilcoxon  test  (see  (3.1.4)), 
that  the  variance  is  reduced  by 


and  hence  we  modify  H 


(3.1.6)  H  =  12[n(n+l) 


g 

I 

i=l 


_1_ 

12 


Ti(Ti-1)(Ti+1) 


* 


(2.3.4)  accordingly  (see  [19]): 


g 

-  I 

i=l 


T  (T  -1)(T  +1) 

■LX  1  i  X 

(^l)  J 


k  -2  ? 

[I  s./n  “  n(n+l)  ]  . 

j=l  J/  j 


For  min(n^ , . . .  ,n^)  ■+  00  ,  H  has  Chi-square  distribution  with 
(k-1)  d.f.  .  Tables  for  small  n^'s  are  available  in  [19]. 


(ii)  Tests  of  Symmetry:  In  testing  H_^  there  are  two  types  of 
problems;  (1)  zero  observations,  and  (2)  the  ties  among  non-zero 
absolute  values.  There  are  two  methods  of  handling  zeros  proposed  by 
Wilcoxon  (1945)  and  Pratt  (1959) .  We  will  compare  these  two  methods  with 
the  help  of  two  examples  in  Section  4.5.  Here  we  use  Wilcoxon’ s  method 
i.e.,  deleting  zero  observations  altogether. 


denote 


Let 

them  by 


A 


V  be  the  number  of  non-zero  observations  and  let  us 

A  A 

X  ,...,X  .  Let  T  be  the  vector  of  ties  in  the 

1  *  V 


A 


sequence 


Let 
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(3.1.7) 


v 

S  =  l  u(X  )  a(R+)  , 

i=l 


V 


/\  /\ 


where  R 


-  £  u( | X  | -| X. | )  ,  1  <  i  <  V  .  It  is  easy  to  verify  that 

j=l  1  J 


under  (see  [8]  Theorem  30A)  we  have 


E 


1  V 

(S/v  ,t)  l  a  , 

i=l  1 


(3.1.8) 


V(S/V,T)  =  {  J  a.2  , 


i=l 


and 


I 


-  2 
a . 
i 


¥  2 
L  ai 
i=l 


v 


-  I 


i=l 


(ai-ai) 


In  the  case  of  sign  test,  a^  =  1  and  hence,  a^  =  1  for  all  t  . 

And  therefore  the  distribution  of  S  is  the  same  as  that  of  S  with  n 
replaced  by  v  .  In  one  sample  Wilcoxon  test,  proceeding  the  same  way  as 
in  two  sample  case,  we  get 


E(S/v,t)  =  v(v+l) 

(3.1.9) 

8 

V(S/v,t)  =  [V(V+1)  (2v+l)  "  j  l  Tj(Tj”l)(Tj+l)]  * 


For  large  sample  approximations  of  above  tests  see  Section  4.2. 

For  Mehra’s  k-sample  test,  we  do  not  have  the  modified  form  of 
L  ,  when  ties  are  present.  Also  no  asymptotic  result  regarding  the  distri¬ 
bution  of  L  under  II ^  is  available. 


■ 
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(iii)  Tests  of  Independence:  Let  t  and  t  be  the  sizes  of 

x  y 

ties  in  (X^,...,X^)  and  (Y  , ...jY  )  respectively.  We  define 

a(i,T  )  and  a(i,T  )  by  (2.2.2),  and 
x  y 


We  have  under  H, 


n 


E(S | x  ,T  )  =  -  (  T  a.)2  and 
1  X  y  n  .L  1 
i=l 


(3.1.10) 


V(SIVV  =n  l  (a(Wa)2  I  (a(i,T  )-a)2 


i=l 


i=l 


In  Spearman  test  with  a^  =  i  ,  we  have,  under 

E(S)  =  E(S) 

and 

gX 

(3.1.11)  V(S|Tx,Ty)  °  i~44(n-x)  [n(n+l)(n-l)  -  J  T*  (T*+l)  (t*-1)  ] 

y 

[n(n+l) (n-1)  -  £  T?  (t^+1) (tT-1) ]  . 

j=l  3  J  J 

Under  certain  regularity  conditions  (see  Theorem  31A  of  [8])  the  distri¬ 
bution  of  S  given  T  and  T  ,  is  asymptotically  normal. 

x  y 

While  using  the  Kendall's  test,  Kendall  (1970)  suggests  the 
following  argument:  If  there  are  consecutive  ties,  all  the  scores 


. 


[(i-  r)  r  l  -  a-n)(l+n)nj  -(1K  =  (tT.xTi  )V  (ll.l.t) 


31. 


arising  from  any  pair  chosen  from  them  are  zero.  There  are  T_^(T  -1) 

such  pairs  and  so  the  sum  n(n-l)  will  be  reduced  by 

x  y 

8  8 

I  T. (T  -1)  and  Ty(Ty-l)  .  Therefore  our  alternative  form  of  the 

•  -*  li  .  ,  li 


i=l 


i=l 


coefficient  T  may  be  written 


(3.1.12)  t*  = 


n  n  _  _ 

l  l  sign(R  -R  )  sign(Q  -Q  ) 
i=l  ,j=l■  y 


g  & 

(n(n-l)  -  j  b  (T*-l)(n(n-l)  -  l  (Ty-1)) 

i=l  i=l 


The  expression  for  conditional  variance  of  K  (defined  by 
(2.5.8))  is  given  in  [13]  for  large  sample  approximation. 


(iv)  Analysis  of  Variance  Tests:  In  Friedman  test,  we  rank  each 
block  seperately.  Let  g^  be  the  number  of  tied  groups  in  block  i  and 
represent  the  size  of  jth  tied  group  in  block  i  .  The  modified 

Q  statistic  is  (derived  similar  to  H  )  given  by 

n  k 

Q  =  (nk(k+l)-{l/(k-l)  l  {(  l  T3  .)-k}]_1  12n2  f  (R„,-R..)2 

i=l  j=l  1,J  j=l  3 

(see  [13]).  The  distribution  of  Q  under  is  asymptotically  Chi- 

square  with  k-1  d.f.  . 


Unfortunately,  no  similar  results  are  available  in  the  case  of 


aligned  rank  test. 


' 
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(c)  MIDRANK  METHOD. 

But  for  the  Van  der  Waerden  and  aligned  rank  tests,  all  the 
tests  have  scores  which  are  equal  to  either  ranks  or  constant  values. 
Therefore  the  midrank  method  (see  Section  2.2)  of  handling  ties  are  the 
same  as  average  score  method. 


In  the  van  der  Waerden  test,  we  have  the  test  statistic 


n 


R. 


■  J,  > 

i=l 


where  R^  represents  the  midranks.  The  conditional  variance  is  reduced 
as  in  the  case  of  average  score  and  the  asymptotic  distribution  will  be 
given  later  (see  Section  4.2). 


In  the  aligned  rank  test,  we  do  not  have  the  modified  form  of 

the  test  statistic  L  while  using  the  midrank  method  as  in  average 

n 

score  method. 


3.2  TIES  IN  WILCOXON  2-SAMPLE  TEST. 

In  this  section  we  would  derive  a  nonparametric  test,  similar 
to  Wilcoxon  2-sample  test,  when  the  underlying  distribution  is  purely 
discrete.  Let  us  denote  the  two  independent  samples  by  X^,...,Xn  and 

Y, , . . . ,Y  (n_+n0  =  n)  with  distribution  functions  F  and  F0  (discon- 

1^1-^  i  z 

tinuous) .  We  want  to  test  Hq  :  F  =  F£  =  F  (say)  against  location  alter¬ 
natives.  It  seems  reasonable  to  choose  a  test  based  on  the  following 


criteria  (see  [24])  : 


. 
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(i)  distribution  free  under  the  hypothesis; 

(ii)  depends  on  observations  only;  and 

(iii)  as  close  as  possible  to  original  Wilcoxon  test. 

Let  us  assume  that  and  F2  have  same  discontinuity  points 

and  denote  them  by  ,  k  =  1,2,...  .  We  define 


Pk  P(X1  V  *  qk  P(Y1  V  ’ 


=  #  of  X’s  which  are  equal  to  ; 

=  it  of  Y's  which  are  equal  to  » 


»k=\  +  vk  ; 


u  = 


» » • • • ) 


.) 


.)  . 


The  ordered  pooled  sample  is  given  by  the  nonzero  components  of 
two  vectors  U  and  V  .  Hence,  any  rank  (order)  statistic  which  depends 
upon  the  observations  only,  can  be  expressed  in  terms  to  U  and  V  . 
According  to  the  criteria  (ii),  the  critical  region  C  can  be  defined  by 
U  and  V  only.  Now  we  show  that  W  is  a  sufficient  statistic  for  the 
vector  of  parameters. 


LEMMA  3.1:  P(u/w)  is  independent  of  p^'s  . 


■ 
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PROOF : 

(3.2.1)  P(u|w)  =  P (U=u  ,  V=w-u  |  W=w) 


P (U=u) P (V=w-u) 

P (W=w) 


Now,  P(U=u)  =  J 


P(U=u  I  X  =  (r  ,  ...,r  ))  •  P(r  ,...,r  ) 

(r1 > • • • >r  )  1  nl  1  nl 

1  ni 

where  the  above  sum  is  over  all  possible  (r_,...,r  )  in  the  space 

1  ni 

’  ^2  *  *  *  *  * 


P(r  ,...,r  )  =  p  *p  '...‘P 

1  nl  rl  r2  rn-l 


Also  conditional  probability  in  the  summation  is  zero  unless  (r 

r  )  has  exactly  u,  of  E,.  *s  ,  u0  of  £  ’s  etc..  Let 
n^  1  1  2  2 

(r’ ,r* , . . . ,r ’  )  satisfy  this.  Therefore,  we  have  only  h(l  <  h  <  n  ) 

1  2  n,  1 

1  h 

u^’s  which  are  nonzero,  say  u^  ,...,u^  s.t.,  £  u^  =  n^  .  We  there- 

1  h  i=l  i 


n. 


fore  have 


u£  ! . . .  u^  • 


non-null  events  in  the  space.  In  this  situation. 


P(U=u  |  X  =  (r|, . . . ,r^  ))  =  1  . 


(3.2.2) 


V 


U£1  U\ 

.*.  P(U=u)  =  j—  j  ~  f  Po  .  .  .  *  Po 
U  »u  '...u  '  ^  \ 

12  h 


Similarly, 


(w.  -u.  )  (w.  -u  ) 


n2!  Pi 

^  J 


Jl  3± 


* 


J2  32 


(w.  -u.  ) 


’  P3 


Jy  Jg 


(w -u.  )  !  ...  (w.  -u.  )  ! 

3g  Jg 


J1  J1 


(3.2.3)  P (V=w-u)  = 


\.  '  ,/ 


. 
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where  g  >  h  and  some  of  the  u.  ’  s  are  zero;  and 


w 


w 


n 


(3.2.4) 


P  (W=w)  = 


'  pj 


8 


w 


W.  ! 
J8 


Hence,  substituting  (3.2.2),  (3.2.3)  and  (3.2.4)  in  (3.2.1)  we  have 


P(u|w)  = 


n  ! 


V 


w.  ! 
3 1 


Un  !  ...  u0  !  (w.  -u.  ) !  ...  (w.  -u.  ) ! 

1  h  **1  J1  Jg  ~*g 


. .  .  w .  ! 
J JL 


n 


u, 


U0  (w.  -U.  ) 


(Po  •••  Po  h)(p, 
*1  Ji 


Jl  31 


(w.  -u.  ) 

p  8  8  )-. 
ig 


w 


w 


•  pj 


8 


8 


The  quantity  in  the  brackets  is  1.  Hence  the  result.  □ 


Let  the  size  of  C  be  a  ,  i.e.  P(C)  =  £  P(W=w)  P(C/W=w)  =  a 


w 


w 


or 


,  I 


n ! 


w  J 


w^  ! . . .w .  !  pj. 


w 

•  g  £  P(u|w)  =  a  .  As  P (C) 


8 


Jg  (u,w-u) cC 


has  to  be  independent  of  p^'s  (requirement  (i)),  we  must  have 
P(R|W=w)  =  a  for  every  w  ,  which  is  the  usual  condition  for  every  dis¬ 
tribution  free  tests.  Since  for  every  fixed  w  we  have  only  finite  set 
of  P(u|w)  ,  and  these  sets  vary  with  w  ,  it  will  in  general  be  impossi¬ 
ble  to  find  a  region  C  with  exact  size  a  .  However,  this  can  be  solved 
by  taking  some  sample  points  in  C  not  definitely,  but  with  certain  given 


probability. 


- 
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Now  suppose  Cq  is  the  rejection  region  [S  >  a]  ,  of  the  same 
size  a  ,  given  by  'randomized*  Wilcoxon  test.  Then  we  have  P(C)  = 

P(Cq)  =  a  or,  P(CnCQ)  =  P(CnCQ)  ,  where  A  stands  for  complement  of 
A  .  One  possible  explanation  of  (iii)  above  is  to  choose  C  such  that 
P(Cn  C  )  is  minimized.  This  may  be  justified  as  follows:  Suppose  F 
is  really  continuous  and  ties  occur  only  because  of  lack  of  precision  of 
measurements.  In  this  case  the  randomized  test  is  approximately  equal  to 
Wilcoxon  test  (as  randomization  procedure  is  similar  to  the  effect  of 
replacing  each  discontinuity  by  an  interval  of  the  uniform  distribution) . 
It  is  therefore  appropriate  to  minimize  the  probability  of  getting  a  dif¬ 
ferent  result  than  that  of  randomizad  test  ([24]).  This  probability  when 
the  hypothesis  is  true,  is 

P(CnC  )  +  P(CnC  )  =  2P(CnC  )  . 

o  o  o 

The  above  is  achieved,  if  we  minimize 

P(Cn [S<a] |w=w)  =  £  P(u|w)  P(S£a | U=u,V=w-u) 

(u,w-u) eC 

for  every  w  ,  where  S  is  the  same  as  in  (2.2.6);  under  the  condition 

(3.2.5)  l  P(u|w)  =  P(c|w=w)  =  a  . 

( u , w— u) 

As  suggested  by  Putter  (1955),  we  can  use  the  following  algori¬ 
thm.  For  every  w  ,  we  order  all  possible  vectors  (u,v)  =  (u,w-u)  by 
the  magnitude  of  P(S<a | U=u,V=w-u)  .  We  take  the  vector  with  the  smal¬ 
lest  probability,  then  the  next  smallest  etc.,  until  the  (conditional) 


■ 


. 


. 
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size  ot  as  in  (3.2.5),  is  reached.  Doing  this  for  all  w  we  get  the 
desired  C  . 

Unfortunately,  the  above  derived  test 
cult  to  apply  and  so  we  modify  it  without  going 
follows.  Instead  of  rejecting  hypothesis  when 
small  we  reject  it  when  E(S | U=u,V=w-u)  is  too 

(3.2.6)  S’  =  Er(s|u,V)  , 

where  denotes  the  expectation  under  randomization.  It  is  easy  to 

see  that  S’  is  the  same  as  S  we  had  in  Section  3.1.  Hence  under 
criteria  (i) ,  (ii)  and  (iii)  we  have  derived  a  test  which  is  the  same  as 
the  average  score  test  we  proposed  earlier.  However,  the  cutoff  point 
does  depend  upon  W  and  the  tabulation  involved  is  prohibitive.  Klotz 
(1966)  has  given  an  algorithm  to  calculate  the  exact  distribution  given 
a  vector  of  ties.  It  is  also  suggested  to  use  the  computer  for  calcula¬ 
ting  the  significance  probabilities.  We  will  compare  this  test  with 
randomized  test  in  Chapter  4. 


seems  to  be  very  diffi- 
far  from  the  test,  as 
P(S<a | U=u, V=w-u)  is  too 
large  ([24]).  Let 


3.3  TIES  IN  SIGN  TEST. 

Let  the  number  of  observations  which  are  positive,  negative 

and  zero  be  n  ,  n  and  n  respectively.  In  Section  3.1,  we  mentioned 

+  o 

that  we  ignore  the  n  zero  observations  which  amounts  to  omitting  ties 


'  from  the  observations.  We  have 


.  . 


# 

> 
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H1  :  P(Xi  >  0)  =  P(X  <  0)  vs  say, 

K  :  P(Xi  >  0)  >  P(X  <  0) 

and  our  test  procedure  is  to  reject  whenever  n+  is  too  large. 

Let 


P(X.  >  OlHp  =  p+  ,  P(X.  =  0  |hi)  =  po  ; 

P(X  >  0 | K)  =  q  ,  P(X  =  0|K)  =  q  ,  P(X  <  0|k)  =  q 

-L  *r  1  O  1  — 


Let  us  consider  the  conditional  distribution  of  n+  given  nQ  =  c 
Under  , 


(3.3.1) 


P(n+=x|nQ=c)  =  P-  (x)  =  (n"C)  (y)11-0  ; 


under  K 


(3.3.2) 


P(n+=x|Vc)  =  PK(x)  =  (n'c)  (j^=-)n"C  fcV 


x  =  0,l,...,n-c  .  Therefore, 


PK(X)  .  ,  ,  ,\x 

p^oo  =  h(c)  (z> 


which  is  strictly  increasing  function  of  x  .  Therefore,  by  Neyman- 
Pearson  lemma,  the  unique  most  powerful  (conditional)  test  is  given  by 


n+  >  k(nQ)  > 


(3.3.3) 
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where  the  cutoff  point  k(n^)  is,  of  course,  the  one  corresponding  to 
8  (n-c,-|) 

The  test  (3.3.3)  amounts  to  "omitting  the  ties  from  observa¬ 
tions".  Let  us  compare  this  method  with  randomization.  The  n  zeros 

o 

are  divided  into  two  parts  according  to  the  outcome  of  a  random  experimen- 

r 

and  suppose  n+  of  them  are  assigned  to  the  positive  part.  The  random 

variable  n  =  n  +  n  is  under  H-  ,  8  (n,— )  and  we  can  use  the  test 
+  +  +  X  z 

(3.3.4)  n+  >  k  , 

without  any  concern  about  unknown  p 

o 

THEOREM  3.2:  The  non-randomized  test  (3.3.3)  is  uniformly  more  powerful 
conditional  (given  nQ=cJ  test  (against  one-sided  alternative  K)  than  the 
randomized  test  (3.3.4). 

PROOF:  Let  n  =  c  ,  and  p(y)  be  the  frequency  distribution  of 

o 

1  r 

8  ( c ,"2")  •  The  joint  (conditional)  distribution  of  n+  and  n+  is 

p-  (x)*p(y)  under  H  and  p,  (x)#p(y)  under  K  .  The  ratio  of  two  is 
H.^  1  k 

P  (x)|P-  (x)  and  hence  (3.3.3)  is  also  unique  most  powerful  (conditional) 
K 

I* 

test  based  on  n,  ,  n  and  n,  .  □ 

In  the  above,  we  have  omitted  the  zeros  altogether.  Recently, 
Krauth  (1973)  has  proposed  a  test  procedure  which  does  not  ignore  the 


zeros . 


Ij  .  •,  II  •!.) 

Si  'cwroq  a>  (C.t.E)  iaa;  IsaainwWt-aw  u  tS.E  M380SBT 

* 
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THEOREM  3.3  (Krauth) :  An  UMP  test  for  testing  H  against  K  with  a 
known  constant  PQ  =  30  is  given  by 


(3.3.5) 


n  +  -z-  n  >  k  (p  ) 
+  2  o  n  o 


PROOF:  Let  us  consider  the  distribution  of 


(3.3.6) 


n,  -  n  =  2n ,  +  n  -  n  =  2  (n ,  +  —  n  )  -  n  ; 
“  —  +  o  +  2  o 


under  K  , 


PK(x)  =  P(n+-n_=x)  =  l  ,"!,n  ,  q++  q_"  qQ° 

n+-n_=x  +  -  o 


n 


2xi-x 


i=0 


and  under  , 


P-  (x)  =  P  (n  -n  =x) 
Hi  +  ' 


p^p+'p-)55  l 

i=0 


2n  i-x 


x  =  -n,-n+l , . . . ,n  .  Therefore, 


x 


(3.3.7) 

Kv  ' 

\U) 

y; 

with 

(3.3.8) 

A  (z,x)  = 
n 

n 

l 

i=0 

n 


o  n  o 


,n,  ,n-iN  i-x 


9 
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If  we  prove  (3.3.7)  is  strictly  increasing  function  of  x  (which  we 

do  in  the  following  lemma),  by  Neyman-Pearson  lemma  the  test  (3.3.5)  is 

UMP  test  for  known  p  =  q  and  hence  the  result.  □ 

o  o 


LEMMA  3.2:  P  (x) | P—  (x)  ,  (3.3.7),  is  strictly  increasing  function  of 
K. 

x  for  x  =  -n+l,-n, . . . ,n  . 

V  X 

PROOF:  As  (~—)  is  strictly  increasing  function  of  x  and  z  <  z  , 
yo 

it  suffices  to  prove  that 


(3.3.8) 


A  (z,x)|A  (z  ,x)  >  A  (z,x-l)|A  (z  ,x-1)  , 

n  'no  n  'no 


for  z  <  z  ,  x  =  -n+l,-n, . . . ,n  .  Or,  equivalently 
o 


(3.3.9) 


A  (x, z) I  A  (z,x-l) 
n  n 


>  A  (z  ,x) |A  (z  x-1) 
no  no 


for  z  <  z  ,  x  =  -n+1 , -n , . . . , n  .  We  prove  (3.3.9)  by  showing  that  the 
o 

derivative  of  H(z,x)  =  An(z,x) | An(z,x-1)  with  respect  to  z  is  nega¬ 
tive  for  z  >  0  ,  x  =  -n+1 , -n , . . . , n  .  For  x  £  0  we  have 


An(z,x) 


y 


with  m  =  [ (n+x) / 2 ]  .  Therefore  it  is  enough  to  show  that 


42. 


m  m 


I  i  m-dOQ  .  ^ 

i=0  j=0  j  j 


nwn-iwnwn-j  )  z±+j  <  Q 


with  m'  =  [(n+x-l)/2]  .  Or, 


m  m’+l 


(3.3.10) 


,wn.,n-i.  .  n  wn-J+\  1+j-l 


i=l  j=l  J 


<  0 


We  consider  only  terms  for  which  i,j  £  {l,2,...,m}  ,  since 
the  terms  with  i=0  or  j  =  m+1  ,  are  negative  anyhow.  For  i  =  j  , 
the  summation  vanishes.  For  the  sum  of  two  terms  with  (i  ,j^)  =  (s»t)  » 
(i2,j2)  =  (t,s)  ;  s , t  £  {l,2,...,m}  ,  we  get 


-(^>0000 


X 


n+x+1 


s+t-1 


(n-2t+x+l) (n-2s+x+l) 


which  is  negative  for  all  s,t  e  {l,2,...,m}  .  This  completes  the  proof 
for  x  <  0  .  For  x  >  0  we  have 


A  (z,x) 
n 


!  <x:*> 


1=X 


i-x 

z 


We  complete  the  proof  proving 


m  m’+l 


nwii-iw  n  wn”j+l\  i+j“l 


1=1  j=i 


<  o 


the  same  way  as  (3.3.11). 


□ 


1/2 

Putter  [24]  has  shown  that  T  =  (2n,+n  -n)/(n-n  )  is 

n  +  o  o 

asymptotically  N(0,1)  as  n  -*  00  .  In  virtue  of  Theorem  3.3,  we  can 
now  state  the  following  result. 
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THEOREM  3.4.  An  asymptotically  UMP  test  for  testing  against  K  , 

wider  the  restriction  PQ  =  9^  »  is  given  by 

(3.3.11)  T  =  (2n  +n  -n)/(n-n  )1/2  >  k 

n  +  o  o 

where  the  cutoff  point  k  corresponds  to  N(0,1)  distribution . 

3.4:  TREATMENT  OF  TIES  IN  WILCOXON  1-SAMPLE  (SIGNED  RANK)  TEST. 

In  §3.1,  we  ignored  the  zeros  from  the  sample  and  then  ranked 
the  rest  of  the  observations  as  suggested  by  Wilcoxon  (1945).  Pratt 
(1959)  has  suggested  a  different  procedure  in  this  section  we  would 
review  these  criteria. 

The  following  three  requirements  have  been  suggested  for  a  test 
when  these  are  O’s  . 

(i)  Increasing  the  observed  values  shall  not  make  a  significantly 
positive  sample  insignificant  nor  an  insignificant  sample  significantly 
negative. 

(ii)  Assuming  that  the  distribution  of  the  observations  has  a  center 
of  symmetry  y  ,  those  values  of  y  which  are  not  rejected  shall  form 
an  interval. 

(iii)  A  sample  shall  be  judged  significantly  positive  if,  when  the 
0’s  are  included  in  the  ranking,  the  sample  is  significantly  positive 
whatever  signs  are  attached  to  the  ranks  of  the  0's  ;  similarly  for  sig- 


'•  * 
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nificantly  negative  and  not  significant. 

Pratt  (1959)  points  out  that  none  of  the  three  conditions  are 
satisfied  (which  are  reasonable  and  are  satisfied  when  there  are  no  zeros) 
when  we  use  the  Wilcoxon’s  procedure.  The  two  methods  of  handling  zeros 
have  been  compared  in  [4]  when  the  underlying  distributions  are  discon¬ 


tinuous  and  we  would  discuss  these  in  Section  4.5. 


CHAPTER  4 


ASYMPTOTIC  RELATIVE  EFFICIENCY  (ARE) 


In  this  chapter  we  examine  asymptotic  efficiencies  of  the 
linear  rank  tests  for  randomness  and  symmetry  with  particular  attention 
paid  to  the  three  methods  of  handling  ties,  discussed  in  Section  2.2. 
This  is  studied  using  ARE. 


4.1  EFFICIENCY. 

Asymptotic  power  of  a  test  against  a  given  alternative 
provides  a  good  clue  to  the  large  sample  operating  characteristic  of  the 
test-  Asymptotic  efficiency  gives  a  comparative  measure  of  the  asympto¬ 
tic  power  of  a  test  relative  to  a  most  powerful  test  or  relative  to  a 
standard  test.  In  the  latter  case  we  call  it  asymptotic  relative  effi¬ 
ciency.  We  consider  the  asymptotic  efficiency  as  defined  in  Hajek  and 
Sidak  ((1967)  p.  267)  and  asymptotic  relative  efficiency  as  in  Hodges  and 
Lehmann  (1956) . 


Assume  that  an  asymptotically  most  powerful  test  for  Hq 

against  q  is  based  on  a  statistic  Sq  ,  where  is  asymptotically 

2 

normal  (0,Gq)  under  the  null  hypothesis  and  asymptotically  normal 
2 

(u  ,G  )  under  the  alternative.  Further,  let  us  consider  another  test 
o  o 

2 

for  H  against  q  based  on  S  ,  which  is  asymptotically  normal  (0,G  ) 


and 


o 

2 

(y,G  )  under  Hq  and  q  respectively. 


Then  the  asymptotic  powers 


-  45  - 


jljjfijj 


46 . 


of  -  test  and  S  -  test  equal 

l-$(k_  -y  cf1)  and  l“4)(k1  -y^"1)  , 

1-a  o  o  '  TV  1-a  '  * 

respectively.  The  expression 

ya 

(4.1.1)  e  =  (^-§)2 

is  called  asymptotic  efficiency  of  S  -  test  (it  is  ratio  of  the  two 
asymptotic  powers  given  above) . 

* 

Now,  let  3  (9)  and  3  (9)  denote  the  power  function  of  two 
n  n 

tests,  say  A  and  A  based  on  same  set  of  n  observations,  against 

a  family  of  alternatives  labelled  by  9  and  let  9  be  the  value  of  9 

o 

specified  by  the  hypothesis.  We  shall  assume  that  all  tests  are  at  the 

same  level  of  significance  a  .  Let  3  be  a  specified  power  with 

a  <  3  <  1  .  Consider  the  sequence  of  alternatives  9  such  that 

n 

(4.1.2)  3  (6  )  +  3  as  n-*00 

n  n 

* 

and  a  sequence  n  =  h(n)  such  that 

(4.1.3)  3  *(9  )  3  as  n  00 

n 

Then  if 


* 

A  ,  A 


lim 


n 


n-*»  n 


(4.1.4) 


e 
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exists  and  is  independent  of  a, 3  and  the  particular  sequences  {0^} 

and  (h(n)}  chosen,  e  ^  is  defined  to  be  the  asymptotic  relative 

A  ,A 

•k 

efficiency  (ARE)  of  the  test  A  with  respect  to  the  test  A  .  Meth¬ 
ods  of  obtaining  the  limit  (4.1.4)  in  different  situations  are  available 
in  literature  (see  for  example,  Hodges  and  Lehmann  (1956)).  ARE  is  use¬ 
ful  for  problems  where  optimum  tests  either  do  not  exist  or  are  not  avail¬ 
able. 


We  shall  use  the  form  of  asymptotic  efficiency  as  described  in 
Hajek  and  Sidak  [9]  (pp.  267-70). 


4.2  ASYMPTOTIC  DISTRIBUTION  UNDER  NULL  HYPOTHESES. 


To  calculate  the  asymptotic  efficiencies  let  us  first  examine 
the  asymptotic  distribution  of  linear  rank  statistic  under  Hq  and  H^  , 


as  discussed  in  [3]. 


The  following  theorems  present  conditions  under  which 


(4.2.1) 


S  -  E(S  t) 


[V{S|t>] 


1/2 


-  N(0,1)  , 


where  E(s|x)  and  V(S|t)  are  as  in  (3.1.1). 


Let  (|)(u)  denote  an  arbitrary  real  valued  function  defined 


on 


the  interval  0  <  u  <  1  and 


(4.2.2)  0  < 


,1 


0 


(0  (  u)  — <f>)  du  <  «>  ,  where  (}>  = 


rl 


0 


cj)(u)  du 
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(4.2.3) 


I  (C  -c)2  /  max  (C.-C)2  -  » 
i=l  l<i<n  1 


THEOREM  4.1:  Under  Hq  _,  if  conditions  (4.2.2),  (4.2.3)  and 


(4.2.4) 


0 


(aCn'T^Xu),!)  -  <Ku))2  du  — >  0 


hold  then  (4.2.1)  follows.  Here,  T  (u)  =  —  {#  of  R.'s  <  un}  and  the 

n  n  1  — 

inverse  is  defined  by  f  ^(t)  =  inf  (x|f(x)  _>  t}  _,  for  a  real  valued 
function  f  . 


PROOF:  Let  us  consider  the  random  variable  =  F(X^)  which  under 

H  are  i.i.d.  with  some  cdf  G(u)  .  Let  W. ,W0,...,W  be  uniform  ran- 
dom  variables  which  are  also  independent  of  .  Let  G({*})  denote 
the  measure  induced  by  G(u)  on  any  set  {•}  of  real  numbers.  Then 
G((y})  =  P(Y=y)  at  discontinuity  points  of  G(u)  ,  and  equals  zero  else¬ 
where.  Now  we  will  prove  that  the  random  variables  U.  =  Y.  -  W.G({Y.}) 
are  mutually  independent  with  uniform  distribution  on  (0,1)  .  Let 


a(u)  =  G(G_1 (u) )  -  G({G  1(u)})  and 


(4.2.5) 


b (u)  =  G(G  X(u))  . 


Then 


P(ui£u)  =  P(Yi£b(u),Wi  G({G  ^(u)})  >.  b(u)-u) 


- 

. 
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If  G(u)  =  u  then  b(u)  =  u  and  P(u  .fu)  =  P(Y  <b(u))  =  u  .  If 
G(u)  <  u  ,  then  G(u)  is  constant  on  the  interval  [a(u),b(u))  and 
W.  G({h_1(u) })  is  uniformly  distributed  on  (0 ,b (u) -a(u) )  .  And  we 
have 


(4.2.6) 


P(U^<u)  =  P(U_j_<a(u))  +  P(a(u)<U  _<u) 


a(u)  +P(T1-b(u))  P(Wt  >  b^IaOl) 


)  = 


u 


It  is  shown  in  [9],  p.  153  that  under  the  assumptions  (4.2.2)  and  (4.2.3) 

the  random  variable  T  la  ,  where 

c 1  c 


n 


(4.2.7) 


Tc=  I  (crc>  d>(U.) 
i=l 


and 


n 


a 


•  i  <vc> 


i=l 


(4>  (u)  -cf))  2  du 


0 


has  asymptotically  standard  normal  distribution. 


It  is  also  shown  on  p.  160  that  S^|cr  where 


(4.2.8)  =  l  (C.-C)  aVj  ;  a^(i)  =  E^U^R*  =  i} 

i=l 


R.  =  rank  of  U.  , 
i  l 


satisfies 


(4.2.9) 


E  { 


(S^-T  )2 
- — }  +  0 
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Consequently,  | +  N(0 , 1)  under  (4.2.2)  and  (4.2.3).  By  (3.1.1) 
we  have 


E{ [S-E(S | T)-S^] 2 | T }  =  E{ [  l  (C. -C) (a(R. ,T)-a^(R?) ) J 2 1 x} 

•  -|  1  1  1 


1  =  1 


(4.2.10) 


I  (C  -C)2  l  [a(r  ,T)-a<f'(r*)]2 
1=1  j=l  J  J 

n  _  j  |1  , 

=  l  (C.-C)  [a(nT  (u)  ,T)-a^(l+[un])  ] 

1=1  J  0  n 


Now, 


(4.2.11) 


(■  [S-E(S  |  T)-S^]2j 


->  0 


if  the  integral  in  (4.2.10)  converge  to  zero  in  probability.  But  the 
integral  in  (4.2.10)  is  less  than  or  equal  to 


i1 


2  |  [a(n  T"1(u),T)-(f)(u)]2  du  +  2  |  [a^d+tun]  )-c|)(u)  ] 2  du  . 

JO  JO 


The  first  integral  goes  to  zero  by  hypothesis  and  the  second 
by  Theorem  b  of  [9],  p.  158.  The  rest  of  the  proof  follows  on  the  same 
lines  as  in  [9],  p.  161.  □ 

When  there  are  tied  observations  in  the  data,  ranks  may  be 
assigned  by  one  of  the  three  methods  described  in  Section  2.1.  Let  us 
state  the  particular  forms  of  Theorem  4.1  in  different  situations. 


du  . 


. 


. 


1 
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Average  Score  Method:  Let  (j)  (u)  be  <j)(u)  averaged  over  the 

LX 

intervals  in  which  G(u)  is  constant  valued: 

(4.2.12)  <f>a(u)  =  4>(u)  if  G(  {G_1  (u)  } )  =  0 

b  (u) 

(j)(t)  dt  otherwise 

a(u) 

where  a(u)  and  b(u)  are  the  same  as  in  (4.2.5)  and  are  left  and 
right  end  points  of  the  interval  containing  u  . 

COROLLARY  4.1:  Under  Hq  3  if  (4.2.2)  holds  3  the  scores  a(i)  satisfy 

(4.2.13)  [  (a(l+[un])  -  (|) ( u)  ) ^  du  -*■  0 

J0 

and  if  4>  (u)  is  square  integrable  and  non-constant  over  (0,1)  ,  then 

LX 

(4.2.1)  holds  for  the  average  scores  defined  by  (2.2.2). 


b (u)-a(u) 


PROOF:  Proof  follows  from  Theorem  4.1  and  the  fact  that  (4.2.13)  implies 
(4.2.4)  ([3],  p.  1112).  □ 


Midrank  Method:  Let  {i.  },  denote  the  countable  set  of  discon- 

k  k> 0 

tinuity  intervals  (a(u),b(u)]  ,  where  a(u)  and  b(u)  are  defined  as 


in  (4.2.5)  for  each  discontinuity  point  of  G(u)  .  Let 


(4.2.14)  d>  =  (Hu)  if  u  is  in  a  continuity  interval 

m 


=  (j)(med  I.)  if  u  is  in  a  discontinuity  interval 


,  nJ 

. 

■ 

' 
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where,  med  I.  refers  to  the  midpoint  of  I.  ,  (a(u)+b (u) ) /2  . 
J  J 


COROLLARY  4.2:  Let  H  be  true.  If  (4.2.2)  and  (4.2.13)  hold,  <f>  (u) 

o  m 

is  square  integrable  and  non-constant  over  (0,1)  ,  {med  I  }  n  are 

K.  K-^U 


continuity  points  of  (u)  ,  and 


(4.2.15) 


a  (— ^-)  4>(u)  ,  for  0  <  u  <  1 


then,  (4.2.1)  follows  for  midrank  scores  (2.2.4). 


PROOF:  It  suffices  to  prove  that  (4.2.3),  which  takes  the  form 

-i  2 

(a(n  T  (u)  ;t)-(J)  (u))  du  — — >  0  , 

n  n  m  P 

J  0 

holds  for  the  scores  defined  by  (4.2.4).  For  the  outline  of  the  proof 
of  the  above,  we  refer  to  [3],  p.  1113.  □ 

Randomized  Ranks: 

COROLLARY  4.3:  Under  Hq  if  (4. 2.2) ,  (4.2.3)  and  (4.2.13)  holds  then 
(4.2.1)  follows  for  the  scores  given  by 

a(R.,T)  =  a(R*) 

where  R.  are  randomized  ranks . 

i 

PROOF:  Since  a(n'T~1(u) ,x)  =a(l+[un])  ,  (4.2.13)  implies  (4.2.4)  and 

hence  the  result.  □ 


•i 


. 
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Let  be  as  defined  in  Section  2.4  and 

T+(u)  =  -  {#  of  R+,s  <  un) 
n  n  i  — 

THEOREM  4.2:  Let  $+(u)  be  a  square  integrable  (on  0  _<  u  £  1) 
function  with 


(4.2.16) 


F+(0) 


[4>"*"(u)]^  du  >  0 


and  let  be  true .  If 


(4.2.17) 


T  n 
o  1 


[a(n  T*  (u);t)  -  4>+(u)]2  du  0 


i  y 

holds ,  then  sla - >  N(0,1)  ,  where  a  =  V(slx)  and  S  is  the  same 

i  n  n  1 


as  defined  in  (2.4.2). 


PROOF:  As  in  Theorem  4.1,  let  Y*  =  f"!"(|x.|)  where  F+  denotes  the 

l  l  i 


cdf 

of 

|x.  |  . 

l 

Let 

U+  =  Y+ 
1  1 

-  W.  G({Y+})  ,  where 

l  i 

G(u)  is  the  cdf  of 

Y+ 

1 

and 

w  , .  .  • 

,w 

n 

are  iid 

uniform  on  (0,1)  . 

Let 

a+(i)  =  E[<(>+(U^)  |  R*  =  i] 


and 

n  +  * 

Sx  =  l  a  (R  )  sign  X  , 

^  i=l 

^  “4“ 

where  R  is  rank  of  U.  .  Then  S  I CJ  -+•  N(0,1)  as  n  ->■  00  [see  [27], 
l  i  <p 


Theorem  2],  where 


' 
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(4.2.18) 


a2  =n 


F+(0) 


[/(u)]2  du 


(S-S  ,) 

Now  we  show  that  E  { - - 


2  }  nr >  0  • 


e{(S-S(J))2|t}  =  VarCS-S^T) 


(4.2.19) 


=  +I  [a(rt,T)  -  a+(r*)]2 


r .  >T 
1  o 


=  n 


r  1 


T  In 

o  ' 


(a(n  1(u),T)  -  a+(l+[un]))2  du 


<  2n 


(a(n  1(u),T)  -  (p+(u))Z  du 


T  In 
o 


+  2n  I  (a  (l+[un])  -  (f)+(u))^  du 


T  I  n 
o 


The  first  integral  in  (4.2.19)  - >  0  by  (4.2.17)  and  the 

second  converges  to  zero  by  Theorem  V.1.4  of  [9].  Therefore, 


(s-s^r 

E  { - — }  — >  0  . 


o 


2  2 

Proof  is  completed  from  the  fact  that  -*•  G  (see  [9],  p. 


161) .  □ 


As  in  the  case  of  tests  for  randomness  we  can  prove  results 
similar  to  Corollaries  4.1,  4.2,  and  4.3  in  this  case  also.  Results 
along  these  lines  for  purely  discrete  distribution  functions  are  given 
in  [27]. 


&  '  I  fi  !  i  !  i 


■ 
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REMARK  4.1:  No  such  results  for  and  are  available  in  litera¬ 

ture.  In  case  of  k-sample  test  for  randomness  we  do  have  similar  results 
(see  Conover  [3]). 


4.3  ASYMPTOTIC  DISTRIBUTION  UNDER  CONTIGUOUS  ALTERNATIVES. 

Let  us  first  note  that  the  locally  most  powerful  conditional 
rank  test  for  Hq  and  is  a  linear  rank  test,  under  certain  regular¬ 

ity  conditions  [see  [3],  Theorems  6.1  and  7.1].  Now,  we  shall  discuss  the 
asymptotic  distribution  of  S  under  contiguous  alternatives  [in  both 
cases:  for  testing  Randomness  and  Symmetry]. 


Let  us  consider  a  distribution  function  F(x,0)  with  parameter 
0  .  Let  f(x,0)  represent  the  Radon-Nikodym  derivative  of  F(x,0)  with 
respect  to  F(x,0o)  and  assume  this  exists.  We  define  the  generalized 
Fisher’s  information 


(4.3.1) 


I(F,0) 


rOO 


— oo 


r(3/30)f(x,0)12 
1  f (x , 0 )  J 


dF  (x,0) 


The  distribution  function  of  Y  =  F(x;0)  is  denoted  by 
G(u;0)  ,  where  F(x;0)  is  the  distribution  function  of  X  .  Let  the 
distribution  function  of  the  X’s  under  Hq  be  denoted  by  F(x;0q)  and 
consider  the  alternative 

H  :  X„,...,X  are  independent  and  X.  is 
an  1  n  i 

distributed  according  to  F(x;@i)  . 


ifc.iotiibubo  tuliavoq  3ao»  ^IXftocI  art!  Jwtt  «><»*  3 ail  1  I 


' 
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The  asymptotic  distribution  of  S  is  found  under  the 


conditions 

(4.3.2) 

max  (0.-0  )  +0 
l<i<n  1  ° 

and 

(4.3.3) 

lim 

n+oo 

i(f,0o)  I  Oj-e  ) 

i=l 

2 

for  0  <  b  <0°  where 

K  ) 

satisfies 

0  <  lim  I(F,0)  =  I (F, 0  )  <  oo  . 

e=e  ° 

o 

Let  also 


(4.3.4) 

exists  and 


_9_ 

90 


f (x,0) / 0=0 

o 


lim 

0+0 


f (x,0)-f (x,0q) 

0-0 

o 


(4.3.5)  f (x , 0  )  =  lim  f(x,0) 

0  0+0 

o 

exists  almost  everywhere  with  respect  to  F(x,0q)  .  We  shall  omit 
the  double  subscript  implied  by  conditions  (4.3.2)  and  (4.3.3)  in  order 
to  take  the  limit.  Let 


(4.3.6) 


L 

o 


n 

n 

i=l 


f(X.,0.) 

f(X.,0  ) 
1  o 


be  likelihood  ratio  and  consider  the  statistics 


•  , 


. 


' 
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(4.3.7) 

n  f(X.,e  )  . 

W0  -  2  l  {[  1  9  J  7  -  1} 

°  i=l  f(Xi’0o) 

and 


(4.3.8) 

To  -  X  <W  «Yi’F>eo)  • 

1=1 

where  we  denote  (j)(u,F,0o)  for 


(4.3.9) 

<}><u,F,e  )  -  „(3/,3,8)  _ 

0  f(F'1(u;0o),0o) 

LEMMA  4.1:  Conditions  (4.3.2)  through  (4.3.5)  imply  Tq  -*■  N(0,b2) 

under  H  . 
o 

PROOF:  Proof  is  omitted  (see  [3],  Theorem  8.1).  □ 


LEMMA  4.2: 

Under  conditions  of  Lemma  4.13  we  have 

(4.3.10) 

log  L  -  T  -  i  b2  — — >  0 

5  o  o  2  p 

and 


(4.3.11) 

log  Lo  -*•  N(-  J  b2,b2) 

under  H  . 
o 

PROOF:  See  Conover  (1973a),  Theorem  8.2.  □ 


' 
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Let 


n 

(4.3.12)  S 1  =  S  -  E{S | t}  =  l  (C.-C)  a(R.,T)  . 

i=l  1  1 


The  limiting  distribution  of  S'  is  already  given  in  Theorem 

4.1. 


THEOREM  4.3:  Let  (f)(u)  be  cl  non-constant  square  integrable  function  on 
0  _<  u  1  3  and  let 

(4.3.13)  f  (a(n  T“1(u),T)  -  <Ku))2  du  — >  0 

J0  n  p 

hold  under  H  .  Then  if 
o 


(4.3.14) 


I  (C  -c)2/  max  (C  -C)2^~  , 

1=1  l<i<n 


holds  3  the  conditions  of  Lemma  4.1  imply  that  S?  is  asymptotically 
2 

N(y0,tf  )  under  where 


(4.3.15) 


l  (c.-cxe.-e  ) 

1  X  O 


<J>(u)  (j)(u,F,e  )  du 

J0  ° 


and 


(4.3.16) 


l  (C  -C)2  ((p(u)-(p)2  du 

i=l  1  0 


PROOF:  We  shall  outline  the  proof.  From  (4.2.9)  and  (4.2.11)  we  have 
S’  and  Tc  asymptotically  equivalent  under  Hq  .  This  and  the  first 
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result  of  Lemma  4.2  imply  that  the  bivariate  random  variables  (S’, log  L  ) 

2  _ 
and  (Tc,To~b  /2)  converge  in  probability  to  the  same  limit.  Under  Hq  , 

2  2 

by  Theorem  4.1  and  Lemma  4.1,  we  have  Tc  ->  N(Q,g  )  and  T  N(0,b  )  . 
Note  also  that 


T 

o 


l 


i=l 


(0.-0 


) 

o 


<J>(Ui#F,0o) 


The  covariance  of  T  and  T  is 

c  o 


n  -  f1 

(4.3.17)  cov(T  . T  )  =  l  (C  -C)(0  -9  )  *(u)  <t>(u,F,0  )  du 

°  i=l  1  °  JQ 


because  E (Tq }  =  0  .  Rest  of  the  proof  that  (Tc,T  )  is  asymptotically 
bivariate  normal  is  the  same  as  in  [9],  p.  218.  This  implies  (S', log  L^) 
is  asymptotically  bivariate  normal  under  H  and  the  parameters  satisfy 


the  conditions  of  LeCam’s  third  lemma,  p.  208  of  [9]  and  so  S’  is  asym- 

2 

ptotically  normal  (lJg,cr  )  .  □ 


Now,  we  state  an  analogous  result  under  ,  the  proof  of 

which  is  similar  to  the  above  theorem.  Let 


(4.3.18) 


4.+(u,F,0o)  =  ■—  f(F-1(i  +  iu),0)|e  =  e( 


Let  F(x,0)  be  a  symmetric  function  for  0  =  0q  (when 
is  true)  and  define  likelihood  function 


n  f (X. ,0  +A) 

L  =  IT  - - — - 

LA  .  .  f(X.,0  ) 

1=1  l  o 


(4.3.19) 


' 
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Assume 


(4.3.20) 


A  ->  0  , 


(4.3.21) 


lim  I(F, 0  )  •  n  A2  =  b2  for  0  < 

n-x» 


and 


(4.3.22) 


0  <  lim  I (F, 0)  =  I(F,0  )  <  «>  # 

0+6  ° 
o 


where  I(F,0)  is  defined  by  (4.3.1).  As  Tq  in  (4.3.8 


n 


n 


Ta  =  I  A  <(>(F(X.),F>0o)  =  l  A  gg  f(X.  ,0) 
i=l  i=l 


THEOREM  4.4.  Let  F(x,0)  satisfy  (4.3.4),  (4.3.5),  (4 


30 


f(-X,0)  Q=Q  =  “  f  (x,0)  |  0  =  0 


30 


o 


If  (4.2.17)  holds  under  for  some  square  integrable 

tion  4>+(u)  that  satisfies  (4.2.16)_,  then  (4.3.20)  and 

2 

that  the  sequence  S  is  asymptotically  N(y^,G  )  under 
2 

o  is  given  by  (4.2.18)  and  by 


(4.3.23) 


UA  =  n  A 


ij)+(u)  <|>+(u,F,0  )  du 

F  (0) 


and  the  sequence  slan  asymptotically  N(y^|a,l)  . 


b2  <  00 


)  let 

0=0 

o 

.3.22)  and 


(on  (0,1))  func- 

(4.3.21)  imply 

H  ,  where 
a 
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PROOF:  The  proof  is  similar  to  the  proof  of  Theorem  4.3  (see  Theorem 
9.1  of  [3])  and  hence  we  omit  it.  □ 


4.4  ASYMPTOTIC  EFFICIENCY. 


When  testing  H  ,  if  there  is  convergence 


n 


(4.4.1) 


I  (C.-C)  (6.-6) 
1  =  1 


^  9  1  /  0  ^ 

(  I  (C.-C)2  l  (6  -e  )2)1/2 

•  i  -i  •  i  i  u 

1=1  1=1 


then  the  asymptotic  efficiency  of  the  test  using  S  is  defined  (see  [9] 
p.  268]  as 


(4.4.2) 


2  2 

®  P  i  P  2  9 


where  P  is  given  by 


(4.4.3) 


pl  = 


( 


0 


<Ku)  <f)(u,F,0o)  du 


0 


(<f>  (u)  — 4>)  du 


0 


<|)2(u,F,eo)du)1/2 


If  we  want  to  compare  two  tests,  for  which  (j)(u)  differs  we 
calculate  the  asymptotic  relative  efficiency  (ARE).  In  the  usual  case 
C^’s  are  the  same,  then  the  ARE  of  the  test  using  (j)^(u)  ,  say  relative 


to  the  test  using  ^(u)  is 

•1 


( 


(4.4.4)  ARE 


0 


<^(u)(j)(u,F  ,0q)  du) 


0 


_  2 

(<^2  (u)  (J) 2 )  du 


^ ,  (^2 


<j>2(u)<j)(u,F,0o)du)  (4>1(u)-^’1)  du 


0 


0 
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Now  let  us  mention  the  changed  P^'s  *  different  methods  of  handling 

ties  (see  [3]).  When  we  use  average  score  method  p^  is  given  by 

rl 


(4.4.5) 


P,  = 


0 


^a^u^^u,F,0o^  du 


([  (<t>ri(u)-<t>)2du  I(F,0  ))1/2 

J0  “  ° 


where  4>a(u) 


(4.4.6) 


is  defined  by  (4.2.12).  Using  midrank  method  we  have, 

rl 


P  = 


0 


<Ku)  (J>(u,F,  0Q)  du 


(*(“)-*) 2du  i(F,e  >)1/2 
mm  o 


0 


where  <j>m(u)  is  defined  by  (4.2.14).  By  using  randomized  rank,  we  get 


(4.4.7) 


P1  =^T 


rl 


0 


<f>(u)  (u,F,0o)du 


(I  (4>(u) -4>) 2du  I(F,6  ))1/2 
J  o  ° 


Let  us  find  out  the  ARE  of  an  average  score  test  (A)  relative 

to  a  randomized  rank  test  (R).  The  numerators  of  both  p^’s  in 

(4.4.5)  and  (4.4.7)  are  identical,  because  (})(u,F,0  )  is  constant  over 

o 

the  same  interval  in  which  <J)(u)  is  averaged  to  give  (p  (u)  .  Hence 

06 


(4.4.8) 


ARE 


A,  R 


((p(u)-(p)2  du 
M) _ 

f(ta(u)-*)2  du 


Note  that  (4.4.8)  is  greater  than  or  equal  to  one  with  equality 
only  if  <|>(u)  is  constant  in  the  same  interval  where  G(u)  is  constant. 
Theorem  6  of  Putter  (1955)  given  below  as  Theorem  4.5,  is  a  special  case 


of  (4.4.8). 


a 
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THEOREM  4.5:  Under  the  regularity  conditions 3  under  which  (4.4.8)  holds  } 
in  the  same  set  up  as  in  Section  3 . 23  the  ARE  of  randomized  test  with 
respect  to  averaged  score  (or  midrank3  as  they  are  the  same  in  Wilcoxon 
test)  is  1  -  £  p2 


PROOF : 


(4.4.9) 


ARE 


0 


(V“)V  du 


R,  A 


(<i>(u)-cf>)2  du 


0 


<j)(u)  =  u  ,  in  Wilcoxon  test  and  hence  the  denominator  of  (4.4.9)  is 


(4.4.10) 


0 


(u  -  -|)du  =  ^ 


Let  q .  =  p  +  +  . . .  +  p .  .  The  (j)  (u)  is  given  by  (using 

i  ^  i  a 


(4.2.12)) 


i 


t  dt 


l  J  q 


q.  <  u  <  q. 
l-l  —  l 


i-1 


(4.4.11) 


2  2 

i  q .  -  q  .  i  q.  +  q.  ,  2q . -p . 

1  l  l—l  l  l—l  _  l  i 

p7  2  =  2  2 

i 


Therefore  the  numberator  of  (4.4.9) 


0 


(<t>a(u)_‘t’)2  du 


I 

i 


r  1  2Vpi  i  2 

(-V1  '  V  du 

li-i 


(2q.-p.-l) 2 

l.  4 
1 


pl 


4 


' 


64. 


•••  arer,a  ■  3  ?  (2W1}  Pi 

1 


Now,  we  will  be  through  if  we  prove  that 


3  l  (2qi-pi“l)2  p.  =  1  -  I  p.  ,  or 
i  i 


(4.4.12) 


Ip.  =  1  -  3  I  (2q . -p . -1)  p. 
.  i  .11  l 

l  l 


For  only  one  point  mass  (i.e.  k=l)  (4.4.12)  is  trivial.  Let 

us  assume  that  it  is  true  for  k  =  £  (i.e.  it  holds  for  all  probabil¬ 
ity  distributions  with  £  points  having  positive  probability. 


(4.4.13) 


I  p3  =  1  -  3  I  (2qi-p.-l)2  p  . 
1=1  1=1 


We  want  to  prove  that  it  is  true  for  any  distribution  with  (£+1) 
points  having  positive  probability 


&+1 

I 


i=l 


V  i  .  3 

\  3  +  P£+l 

i=l  q 


where  q  =  1  -  p£+1 


o  &  2q.  p.  p. 

,3  {1-3  j1(-T-f-1)  f}  +  pll 


using  (4.4.13)  for  probability  distribution  with  —  ,  i  = 


I  p.  1-p 


*  i  *  j?  1 1 

as  probabilities  (  \  —  =  -  =  1)  .  Therefore, 


1=1  q 


' 


65. 


£+1 

l 

i=l 


Pi  =  (1'PUl)3  +  p3+l 


£ 

3  I  (2q  -p  -q)2p 

•  1  -L  J-  -L 

i=l 


(1_IW3  +  pLi ' 3  J.  (2qi=pr1+p)i+i)Zpi 

i~l 


£ 


"  (1_ps.+i) 3  +  p3+i"  3  (2Vpi"1)Zpi_  3pt-i(1_ps,+i) 

1=1 


£ 


6p<t+i  z  <2w1)pi 

i=l 


or 


£+1  ~ 

(4.4.14)  I  p  =  1  - 
1=1 


£+1 

3  I  (2q_.-p_.~l)  2  p.  +  6p 
i=l 


l  l 


3 

£+1 


-  6p 


£+1 


£ 


-  6p«.+i  Z  (2qi-pr1)  pi  • 

i=l 


It  can  easily  be  shown  that  for  every  n  ,  with  k  =  n+1 


n 


(4.4.15) 


J  (2qi-pi-1>  pi  -  pn+i  -  pn+i 


Therefore  by  (4.4.14)  and  (4.4.15),  we  have 


£+l  Q 

Z  pi-i 
1=1 


n+1 

3  I  (2q.-p.-ir  p, 

»  i  -L  -L  i. 

i=l 


Hence,  by  induction  hypothesis  the  result  follows.  □ 


In  case  of  tests  for  symmetry,  the  asymptotic  efficiency 


becomes  (Canover  (1973a)) 


, 


' 


. 


1 


66 . 


(4.4.16) 


e  = 


.  <t>+(u)  <J>+(u,F,8  )du]2 

F  (0) ° 


F+(0) 


[|t,+(u)]2du 


.  [<t>+(u,F,0  )]2du 

F  (0)  ° 


We  can  discuss  as  in  case  of  Hq  ,  the  asymptotic  relative 
efficiencies  of  different  methods  of  handling  ties  and  prove  Putter's 
(1955)  Theorem  2  as  a  special  case. 


4.5  TWO  METHODS  OF  HANDLING  TIES  AT  ZERO:  COMPARISON. 

Two  methods  of  handling  ties  at  zero  in  Wilcoxon  signed  rank 
tests  has  been  mentioned  in  Chapter  3  (Pratt's  method  and  Wilcoxon 's 
method).  In  this  section,  we  will  compare  the  asymptotic  efficiencies 
of  two,  as  given  by  (4.4.16)  and  show  that  each  one  performs  better  in 
different  conditions  (see  [4]). 

Let  X, ,X0,...,X  be  a  random  sample  with  discrete  distri- 
1  2  n 

bution  function  F(x,0)  .  Let  p(x,0)  represent  the  probability  func¬ 
tion.  In  order  to  apply  the  results  of  preceding  sections,  F(x,0) 
should  satisfy  the  following  conditions 

for  some  0=0 

o 

exists  almost  everywhere  with  respect  to 

exists  almost  everywhere  with  respect  to 


(i)  p(x,0Q)  =  p(-x,0Q) 


Cii)  f(x  0)  =  — 

Ulj  p(x,0  ) 


F(x,0q) 


(iii)  -g-Q  f(x,0) 


0=0 


o 


. 

' 


. 
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(iv)  lim  f(x,0)  =  1 
0-*O 


almost  everywhere  with  respect  to  F(x,0q) 


(v)  lim 

0->-0  -oo 

o 


o 


o 


o 


The  above  conditions  do  hold  for  the  examples  under  considera¬ 
tion.  Let 


t-l  a(B.I')  sign  X  |  (  J 
i=l  RT>t 


(4.5.1) 


1  o 


are  scores.  If  scores  satisfy  the  conditions  of  Theorem  4.2  then  (4.5.1) 
is  asymptotically  standard  normal  (in  Conover  (1973b),  the  statistic  (2.1) 
is  incorrect  and  (4.5.1)  is  corrected  form  of  that).  In  the  following, 
we  will  calculate  asymptotic  efficiency  of  T  from  (4.4.16)  in  differ¬ 
ent  situations. 

(a)  W  :  Wilooxon' s  Test  with  Pratt's  method  for  ties  at  zero  and 


the  Randomized  Rank  method  for  other  ties:  Here  the  scores 
a(i)  =  i/n+1  and  converge  to  (})*(u)  =  u  >  0  <  u  <  1 
Hence  (4.4.16)  becomes 


(4.5.2) 


n  i  to  I  .  '  o  nno ;  fa  tfr.  n  «  it  ( :  •  in 


. 

. 
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(b)  W  :  Wilcoxon' s  test  with  Pratt's  method  for  ties  at  zero 
and  averaged  (=  midrank.)  rank  method  for  other  ties: 

Since  the  midrank  method  is  used,  the  score  does  not  converge 
to  u  because  of  discontinuities  in  the  distribution  function.  In  this 
case  the  scores  converge  to 


And  (4.5.3)  and  (4.4.16)  gives 


(4.5.4) 


ew 


(c)  W  :  Wilcoxon' s  test  with  zero  discarded  and  randomized 


o 


rank  method  for  other  ties: 


The  scores  start  at  l/(n+l)  •  for  non-zero  observations  rather 


than  at  about  pQ  as  in  the  previous  case.  The  scores  converge  to 


(4.5.5) 


<t>w  (u)  =  0 


0  <  u  <  p 


o 


o 


p  <  u  <  1 


1-po 


o 


which  gives 


1 


3  [ 


P 


o 


o 


e 


W 


o 


o 


(4.5.6) 


. 
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(d)  W  :  Wilcoxon' s  Test  with  zero  discarded  and  midrank  method 

o 

for  other  ties: 

From  (4.5.3)  and  (4.5.5),  we  have  scores  converging  to 


(4.5.7) 


(P_  (u) 

W 


=  0 


V^-Pq 

l-p„ 


0  <  u  <  p 
—  o 


p  <  u  <  1 
o  — 


which  gives 


(4.5.8) 


3  [  f  u  <t>+(u,F,e  )du  -  p 

JPQ  0  ° 


<t>+(u,F,e  )du]2 
po 


w 


[(1-pJ3  -  I  P-  [„  [<(>+(u,F,e  )2  du  1 

ii*0  JPo 


Having  discussed  the  asymptotic  efficiencies  of  different  methods  of 
handling  ties  in  Wilcoxon’ s  signed  rank  test,  we  give  two  examples;  one 
of  which  favours  Pratt’s  method  and  the  other  discarding  zeros. 


EXAMPLE  4.1:  Let  us  consider  the  discrete  uniform  distribution.  Under 
null  hypothesis  the  probabilities  are  equal  and  symmetric  about  zero, 
namely 


(4.5.9)  P(x,0)  =  1/ (2k+l)  for  x  =  0,+l,...,+k  , 

and  zero  elsewhere,  under  the  null  hypothesis.  Let  the  alternative  be 


P (X=x) 


1+xQ 

2k+l 


x  =  0,+l,...,+k  ,  0  <  |  0  |  < 


=  0 


elsewhere. 


<—i  | 


Then  H,  is  0=0  =0.  We  have, 

1  o 


c|)+(u,F,0)  =  F  1  [  (u+l)/2]  =  0  ,  0<u<p  =  1 


—  ( 2k+l ) 


-lr(u+l),  _  (2i-l)  _  ,  (21+1) 

‘  F  1  2  J  "  1  >  '(ImT  u  -  (2k+l) 


Therefore  (2.5.2),  (2.5.4),  (2.5.6)  and  (2.5.8)  become  respectively, 


e  =  (4k2+6k+2)/(4k2+6k+3)  , 

w 


ew  1  ’ 


e  =  (16k3+8k2-7k+l) / (16k3+8k2)  ,  and 

w 

o 


'W 


=  (16k3+8k2-7k+l) / (16k3+8k2-4k-2)  . 


For  k  =  1  ,  W  and  Wq  are  equivalent  with  efficiencies  1 


For  k  >  1  ,  we  have 


W  <  W  <  W  <  W 
o  o 


Hence  Pratt's  method  seems  to  be  preferred.  A  table  showing 
different  e^'s  for  different  k  has  been  given  in  [4]. 


EXAMPLE  4.2:  Let 


p(x=x)  =  (k^)  ek+x  (i-e )k  x  ,  x  =  o,+i,...,+k 


=  o 


,  elsewhere. 
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The  test  of  symmetry  tests  9  =  -j  . 


In  this  case  we  have  for  0  <  u  <  1  and  1  <  i  <  k 


a+,  „  lx  . -1  r (u+1) .  n  „  .  .  /2kN/lx2k 

(p  (u,F ,— )  =  4F  [>  2  ']  =0  ,  0  <  u  £  PQ  =  (k  )(j) 


=  4i 


,  P.  <  u  <  P. 
l-l  l 


where 


P. 

l 


o 


+  2 


k 

l 

j=i 


/2k  x  /T  2k 
Ck+jM2; 


Let  I 


k-1 

2k  -  2  l 

3=0 


Then  we  have 


ew  =  3I2/[8k(l-p3)] 

*W  "  3I2/ [8k(l  -  p2  -  p3)] 

ew  =  3(l-4k  p2)2/{8k[(l-pQ)3  -  l  p3]} 
Wo  i^O 


i  >  1  . 


The  above  formulae  do  not  suggest  any  obvious  ordering  but 
some  numerical  results  for  different  k  (see  [4])  yield  that  Wq  is 
preferred. 


CHAPTER  5 


GENERAL  REMARKS 


In  this  chapter  we  give  some  general  remarks  which  may  be  of 
some  use  to  a  practical  statistician.  The  main  concern  while  using  a 
rank  test  when  ties  are  present  is  that,  the  null  distribution  of  statis¬ 
tics  depends  upon  the  pattern  of  ties  and  is  usually  difficult  to  compute. 
Let  us  note  the  following  points. 

(i)  As  proved  in  Section  4.4,  the  test  statistics  based  on  the 
average  score  method  is  more  powerful  than  that  based  on  the  randomized 
rank  procedure.  But  as  the  tables  for  each  vector  of  ties  are  different 
in  the  former  case,  it  may  not  always  be  practicable  to  use  the  average 
score  method.  However,  we  suggest  the  use  of  average  score  method  in  case 
of  large  samples,  i.e. ,  whenever  a  large  sample  approximation  is  used  and 
modify  the  test  statistics  according  to  the  charge  in  the  variance  (for 
example,  see  (3.1.6)). 

(ii)  In  case  we  do  not  have  the  modified  form  of  the  statistics  or 
limiting  distribution  of  the  test  statistics,  when  ties  are  present  is 
difficult  to  compute  we  should  note  that  using  the  original  test  statis¬ 
tics  and  ranking  ties  by  averaged  score  method,  increases  the  level  of  sig¬ 
nificance  than  the  one  indicated  by  tables  (Hajek  (1969)). 
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(iii)  We  also  suggest  the  use  of  Computer  Tables .  By  this  we  mean, 
the  use  of  computer  programmes,  which  should  be  available  in  readily  usa¬ 
ble  forms,  to  calculate  the  probability  (given  a  vector  of  ties) 

P[S  >  s/T]  ,  where  s  is  the  observed  value  of  S  and  T  is  the  vector 
of  ties.  By  this  procedure  we  do  not  have  to  print  huge  amounts  of  tables 
which  may  be  used  rarely.  Klotz  (1966)  has  given  an  algorithm  to  compute 
approximate  probabilities  in  case  of  Wilcoxon  two  sample  test.  The  approx¬ 
imation  is  quite  good  for  large  n  (n  ^  5)  but  for  n  <  5  it  fails  like 
other  approximations. 

(iv)  In  tests  of  symmetry  we  have  the  problem  of  zero  observations 
apart  from  the  usual  ties.  Among  the  two  methods  Pratt’s  and  Wilcoxon’ s, 
as  discussed  in  Section  4.5,  it  is  hard  to  recommend  one  over  the  other. 

As  argued  by  Pratt,  omitting  zeros  from  the  observations  (Wilcoxon 's 
method)  seems  to  be  causing  some  loss  of  information.  Hence  intuitively, 
we  suggest  the  Pratt’s  method  (ranking  zero  along  with  other  observations 
and  then  dropping  them  from  the  rank  vector) . 

(v)  If  the  number  of  tied  observations  are  very  few,  it  might  be 
much  easier  to  use  the  randomized  rank  procedure  and  hence  use  the  usual 
tables  (see  Section  3.1)  without  losing  much  power  than  to  use  averaged 
score  method  or  midrank  method  and  so  requiring  tabulation. 


■ 
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